TopCoder's Open Community Challenge Process Yields 970 Fold Increase in Speed for Big Data Genomics Sequencing Algorithm







Harvard Medical School Research Shows Prize-Based Competitions and Platforms Provide Unprecedented Levels of Accuracy in Solving Computational Biomedicine Problems





GLASTONBURY, Conn.Feb. 7, 2013 /PRNewswire/ -- TopCoder®, Inc., today announced the publication of a study that details how a potential shift in the way basic science is conducted can be achieved using a community-based open innovation platform to solve complex biological problems more quickly than traditional approaches and at a fraction of the current cost.

The peer-reviewed article "Prize-based contests can provide solutions to computational biology problems" appears in theFebruary 7 issue of Nature Biotechnology, and reports the findings of Eva Guinan, HMS associate professor of radiation oncology at Dana-Farber Cancer Institute, Karim Lakhani, associate professor in the Technology and Operations Management Unit at Harvard Business SchoolRamy Arnaout, HMS assistant professor of pathology at Beth Israel Deaconess Medical Center and Kevin Boudreau, assistant professor of strategy and entrepreneurship at London Business School.  
Researchers identified a program that can analyze vast amounts of sequence data from the genes and gene mutations that build antibodies and T cell receptors. Since the immune system takes a limited number of genes and recombines them to fight a seemingly infinite number of invaders, predicting these genetic configurations has traditionally proven a massive challenge, with few good solutions
The solution sought through the competition and research was a tool that could calculate the edit distance between a query DNA and the original DNA string. This was a real world problem where the limitations of existing tools had severely constrained the ability to pioneer new advances in medical knowledge.
The researchers offered TopCoder what they thought would be an impossible goal: to develop a predictive algorithm that was an order of magnitude better than either a custom solution developed by Arnaout or the NIH's standard approach (MegaBLAST) and that could scale up to mounting data demands. To do this, they had to first reframe the problem, translating it so that it could be accessible to individuals not trained in computational biology. Among the 84 solutions produced by the TopCoder Community, 16 were an improvement over MegaBLAST with one over 970 times faster than either and was produced during a two week long competition costing just $6,000.
"This is a proof-of-concept demonstration that we can bring people together not only from different schools and different disciplines, but from entirely different economic sectors, to solve problems that are bigger than one person, department or institution," said Eva Guinan, HMS associate professor of radiation oncology at Dana-Farber Cancer Institute and director of the Harvard Catalyst Linkages Program. "Given how complicated the immune system is, this has been a particularly formidable biological problem, and building tools for solving it has been hard and time-consuming. We were stunned by the power of these results and their potential application."
"In a way, the immune system is really the dark matter of biology," said Ramy Arnaout, assistant professor of pathology at Beth Israel Deaconess Medical Center. "We have all this sequence data, and there's no good way to figure out what it's doing. Not only did the best entries achieve truly superior performance, but also this kind of crowdsourcing has the potential to be a general solution for a whole class of problems in biology. No single university or institution has the bandwidth and resources to achieve this kind of result so quickly and efficiently."
The challenge drew 733 participants, of whom 122 (17%) submitted software code. This group of submitters, drawn from 69 countries, included roughly half (44%) professionals with the remainder being students at various levels. None were academic or industrial computational biologists, and only five described themselves as coming from either R&D or life sciences in any capacity. The 122 TopCoder members submitted 654 submissions yielding 89 different approaches to the problem. Collectively, participants averaged 5.4 submissions each. Participants reported spending an average of 22 hours developing solutions, for a total of 2,684 hours of development time. Sixteen of the submissions outperformed the accuracy (77%) of the traditionally developed custom solution and 30 outperformed the NIH MegaBLAST benchmark for accuracy (72%).  A total of eight submissions achieved an 80% accuracy score, which is very near the theoretical maximum for the dataset.
"We're excited to see that ideas from economics and management fields can be so productively applied to medical research," said Kevin Boudreau, assistant professor of strategy and entrepreneurship at London Business School. "This progress is heartening, particularly in view of the computational challenges we face in understanding so many diseases. We hope this provides a model of how social science and medical researchers can collaborate to solve real-world problems that matter to people."
According to Karim Lakhani, associate professor in the Technology and Operations Management Unit at Harvard Business School, it is not only the world of basic biomedical research that can benefit from this project, but any organization that is facing significant data analytics and computational challenges. "Our research with Harvard Catalyst and the NASA Tournament Lab initiative points to the applicability of deploying crowds as an innovation partner for extraordinarily difficult challenges where there are significant personnel and paradigmatic bottlenecks," he said. "This paper highlights the use of an alternative organizational form that is cost effective and productive. Many more organizations should also be considering how to effectively use crowds for problem solving."
The results achieved by contest participants in only fourteen days improved significantly upon the existing solutions available to academic researchers, decreasing processing time by up to three orders of magnitude with accuracy reaching the theoretical maximum. While the solvers were virtually devoid of domain-specific knowledge, abstracting the problem into general algorithmic and mathematical terms allowed a wide range of non-domain experts to address an important, complex problem. These contestants brought to the problem whatever skills and expertise they had or could find, likely yielding a far more diverse toolkit than would be available locally, and generated significant diversity in technical approaches.  Accessing such diversity may be particularly important, as big data biomedical analytics is a rapidly evolving field in which it is difficult to know a priori the kind, quality and breadth of expertise needed to generate an effective solution. 
The work was funded by Harvard Business School's Division of Research and Faculty Development, the NASA Tournament Lab at Harvard's Institute for Quantitative Social Science, and Harvard Catalyst. The research effort also included invaluable insights and contributions from coauthors Po-Ru Loh (MIT), Lars Backstrom (TopCoder), Carliss Baldwin (HBS), Eric Lonstein(HBS), Mike Lydon (TopCoder) and Alan MacCormack (HBS).
About TopCoder, Inc.
TopCoder is the world's largest open innovation platform and competitive community of digital creators with more than 450,000 members representing algorithmists, software developers and creative artists from over 200 countries. The TopCoder Platform and Community create digital assets including analytics, software and creative designs and solutions for a wide-ranging client base through a competitive, rigorous, standards based methodology. Combined with our extremely talented community this groundbreaking methodology results in superior outcomes for our clients. For more information about sponsoring TopCoder events and utilizing TopCoder's software services and platforms, visit www.topcoder.com.
TopCoder is a registered trademark of TopCoder, Inc. in the United States and other countries. All other product and company names herein may be trademarks of their respective owners.
Jim McKeown
TopCoder, Inc. 
860.633.5540