In biology and medicine, not only has a lot of data already been generated, but technological advances are enabling us to gather more of it faster every year. Having a lot of data, however, does not necessarily mean we have any greater understanding of how things work.
Discovering how diseases begin and how they might be treated comes from putting the data in a context that it can be understood and then putting the data in the right hands to use it. One example of this is human genetics. Humans have about 25,000 genes, and we have data on the location of each gene, but we still don’t know what about one-third of them do.
In my lab, we focus on using computers to make sense of the “data deluge” by finding patterns within large databases and using different sources of information to identify and experimentally verify causal relationships buried within the data. Using this method, we’ve been able to predict the function of thousands of the remaining human genes that still have no known function. With collaborators, we have been able to test the predicted functions of dozens of these genes in the lab and discovered that several of these formerly unknown genes play important roles in immune cell movement, coagulation, breast cancer progression, DNA repair and cell division.
We hope to continue to push the boundaries of what we know about human genetics as far as possible with this new algorithm until, hopefully, this “Final Third” of our genome will no longer be a mystery.