]> 2013-03-05T10:30:00-05:00 2013-03-05T12:00:00-05:00 In recent years we saw an explosion of cheap genetic tests, which lead to the emergence of personalized medicine. Personalized medicine is defined as practice of medicine that is tailored to specifics of individual patient. My work addresses the problem of attempting to predict individual’s predisposition towards certain chronic diseases based on the individual’s genetic makeup. The benefits of such work allow for more selective administration of invasive tests such as biopsies, which are known to cause health problems themselves.

Recently NIH has done a number of Gene Wide Association Studies (GWAS) that resulted in massive datasets that contain subjects’ generic makeup and labeled with clinical data including occurrence of chronic diseases. Unfortunately, given relatively small number of patients in such studies and the vast number of genes possessed by human beings, these datasets could not be analyzed with traditional statistical predictive models.

The traditional models require large number of samples (patients) with a very few features per sample. My work attempts to solve this problem by employing state of the art Machine Learning techniques. In the past year I have built a software system that is capable of crunching of multi-terabyte scale datasets to refactor the NIH data into the form that is palatable by modern Big Data systems. I have run initial stages of feature selection. I will present the current state of the work and future plans. Another goal of this work is to ensure the repeatability of the experiments and flexibility to run with any similar dataset from current and future studies.

]]>