Bioinformatics & computational biology is on IEEE's list of top technologies for 2022

image 

Machine learning, computational biology, bioinformatics, nanotechnology and the Internet of things are on IEEE's list of top technologies for 2022.

Curious about what the technology landscape will look like in 2022? The Institute of Electrical and Electronics Engineers (IEEE), which represents more than 400,000 engineers, has come up with a report that looks to the future and predicts what the hot technologies of 2022 will be. Indeed, nine technologists led by IEEE Computer Society President Dejan Milojicic spent a large part of this year pondering this question. The results can be found in the IEEE CS 2022 Report, which looks at 23 future technologies that could change the world by 2022. The report can be found here.

CS 2022 Report:IEEE Computer Society

"These technologies, tied into what we call seamless intelligence, present a view of the future," said IEEE’s Milojicic, in a statement. "Technology is the enabler. What humanity takes out of it really depends on human society."
  • The following contributed to sections of the report: Mohammed AlQuraishi, Harvard Medical School; Angela Burgess, IEEE Computer Society; David Forsyth, Cornell University; Hiroyasu Iwata, Waseda University; Rick McGeer, Communications and Design Group, SAP America; and John Walz, retired from Lucent/AT&T.

Background

  • The importance of computation in the acquisition, analysis, and modeling of biological systems has been steadily increasing for the past several de-cades. Contemporary bioinformatics and compu-tational biology, twin fields divided roughly along the lines of data acquisition and analysis for the former and phenomenological modeling for the lat-ter, comprise a strikingly wide range of topics and disciplines.
  • Owing to the intrinsic breadth of biolog-ical phenomena, which ranges from the molecular to the cellular to the organismal and ecological, computational biologists and bioinformaticists must grapple with a diverse set of problems and devise an equally diverse set of tools to solve them. As a result, a number of distinct subdisciplines have come to define the field, coarsely demarcated by the scale and type of phenomena they address.

Genomic Bioinformatics

The acquisition and analysis of genomic data comprise the field of genomic bioinformatics. This includes the initial acquisition of raw sequencing data, the interpretation and assembly of such data into partial and complete genomes, the analysis of sequenced genomes for statistical correlations in-dicative of diseases and other traits, and the mining of genomes for overrepresented motifs and other sequence features.

Structural Bioinformatics

  • The analysis, modeling, and simulation of biological macromolecules—namely, proteins, DNA (Deoxy-riboNucleic Acid), and RNA (RiboNucleic Acid)— comprise structural bioinformatics. The holy grail of the field has been, for several decades, the predic-tion of the three-dimensional structure of proteins from their amino acid sequence. A similar chal-lenge remains open for RNA molecules. Beyond structure prediction, structural bioinformatics is concerned with the analysis and simulation of bio-molecules to predict their interactions with other biomolecules and to infer useful physico-chemical properties .

Systems Modeling

  • The modeling and simulation of a set of biologi-cal parts is the domain of systems biology. What constitutes an appropriate set for study can range from a small subsystem of a biological organism, such as a single signaling pathway, to an en-tire biological cell with its complete metabolic and transcriptional networks. A plethora of modeling and simulation techniques are typi-cally employed, depending on the complexity of the underlying phenomena and the availability of experimental data.

Phylogenetics and Evolutionary Modeling

  • The phenomena of the three aforementioned fields, the submolecular, molecular, and supramolecular, can all be studied in light of evolution. Evolutionary genomics concerns the use and comparison of multiple genomes to infer functional regions that are more likely to be conserved over evolution-ary timescales. The use of evolutionary analysis of structures similarly helps identify functional hotspots on biomolecules and informs the predic-tion of protein structure. Finally, the analysis of biological pathway evolution elucidates how the rewiring of cellular circuitry leads to new behaviors.

Current State of the Field

  • While bioinformatics and computational biology constitute a broad field, genomic bioinformatics currently occupies an oversized role within the field. This has been driven by significant chang-es in both supply and demand over the past few years. On the supply side, progress in sequencing technology resulted in explosive growth in the availability of genomic sequences, with the rate of increase outpacing Moore’s law for over a decade now. In 2000, the first human genome draft was completed at a cost of $3 billion after a 10-year effort.
  • Today, an entire human genome can be sequenced in less than a week and for less than $10,000 . This abundance of sequence information, while a great scientific opportunity,has also created an unprecedented demand for new computing tools and infrastructure capable of analyzing enormous amounts of data. The trajecto-ry of genomics is a classic example of a disruptive technology, particularly on the computational side. To underscore the point, the cost of computation in the overall sequencing pipeline has historically been fractional and inconsequential. As of 2010, the costliest aspect of the sequencing pipeline is the computational analysis required to turn raw data into completed genomes.
This presents a tremendous challenge to bioinformaticists and computer scientists to develop new algorithms and computational infrastructures capable of keeping up with the unrelenting growth in genomic data predicted for the next several years.

Challenges

  • The explosive growth in the availability of genomic data, in particular when compared to other bio-informatics fields that have not benefited from similar data growth, has resulted in a high fraction of the bioinformatics effort being focused on solv-ing sequencing problems. The overarching focus of this effort has been the acquisition and assembly of genomic data, but not necessarily its interpretation, as captured by the classic Cell article titled, “Se-quence First, Ask Questions Later”.
  • While this approach was appropriate during the initial stages of the genomic revolution, our abil-ity to analyze genomic data now lags our ability to acquire it. One area where this is clear is ge-nome-wide association studies, or GWAS. In such studies, a large number of patient genomes are sequenced, and individual genomic loci are tested for statistical correlations with diseases. Despite the initial high expectations for such studies, the current consensus is that most GWAS studies have been unsuccessful, because the typical strength of most disease correlations found has been very weak. So serious is the problem that it has acquired its own name, “missing heritability,” which refers to the many diseases that are known to be heritable but whose precise genetic causes have escaped elucidation .
  • The causes of the so-called missing heritability are myriad, including lack of sufficient data to provide the statistical power necessary to find very weak correlations. But equally important are the statis-tical and computational techniques used to mine genomic data, which were conceived in an era when hundreds, instead of trillions, of data points were the norm. Furthermore, such methods typical-ly assume a simple, even linear, mapping between inputs (genomes) and outputs (phenotypes), when in reality the functions mapping human genomes to disease phenotypes are likely to be extremely complex.

Where We Think It Will Go

The coming decade will see a shift in focus from genome acquisition to genome interpretation. This will likely be precipitated by three important developments.

Qualitative increase in data quantity.

  • Advances in sequencing technology continue to be made,and if the exponential trajectory is maintained, a 100- to 300-fold increase in the number of se-quenced genomes by the end of the decade is possible. Such increases will provide a qualitative improvement in available statistical power.

Improved statistical methodologies.

  • Statistical inference methods designed specifically to tackle genomic bioinformatics will become increasingly more common and will exploit the unique structure of genomic data to infer subtle correlations, partic-ularly ones in which a disease is dependent on the state of multiple mutations.

Convergence of genomic, structural, and sys-tems approaches.

  • Perhaps most importantly, the currently separate fields of genomic, structural, and systems bioinformatics will converge. The underlying driving force behind this shift is the complex mapping function between genotypes and phenotypes. Even with improvements in statistical methodologies and increases in data sizes, if every human genome were sequenced, the scientific community would obtain around 1010 genomes.
  • In contrast, the mutational landscape of the human genome is around 43,000,000,000 in size. Brute-force statistics and data acquisition will be insufficient to decode the human genotype-phenotype func-tion. Instead, the interpretation of genomic data will need to proceed in a stepwise fashion, with the initial focus on understanding the molecular consequences of genomic changes. Doing so will require an understanding of how sequence deter-mines structure, elevating structural bioinformatics to a central role in a disruptive manner. The types of analyses done within structural bioinformatics will be different from today’s, as the emphasis shifts from coarse-grained prediction of de novo structures to the precise prediction of mutational effects on structure. The end result of this shift will be the convergence of genomic and structural bioinformatics.
  • As the ability to interpret genomic data molecularly improves, the next step will be to interpret genom-ic data in terms of systems-level phenotypes, at least on the pathway and cellular level. To do so will require that genotypes are first mapped onto structural phenotypes, which are then mapped onto systems phenotype, in a bottom-up approach. In a similar vein to the first shift, understanding the effects of structural changes on system behavior will necessitate a move away from the study of individual biomolecules to the study of complexes of molecules and their interactions.
  • This is currently the domain of systems biology, but it is done in a top-down fashion in which high-level experimental data is used to fit observed systems-level phenom-ena, instead of a bottom-up approach in which known molecular interactions are simulated to obtain, in an emergent manner, the observed sys-tems-level behavior. Achieving this will result in the convergence of structural and systems bioinfor-matics, where systems-scale structural simulations play a central role. Such a shift is already underway, although on a limited scale.

Cancer Modeling

  • Many types of cancers are caused by somat-ic mutations, i.e., mutations acquired during the lifetime of an individual, which disrupt important signaling pathways in human cells. Currently, many large-scale projects are underway to identify the specific mutations responsible for different types of cancers.These projects rely on acquiring a large number of tumor genomes and searching for overrepresented mutations that may be indic-ative of a causal role. Unfortunately, as described earlier, finding such causal links is difficult, as many cancers are affected through a large number of mutations acting in concert.
  • Furthermore, the disruptions caused by these mutations often affect multiple proteins in a signaling pathway, such that the integrative effect cannot be ascertained with-out a systems-level model of how the signaling pathway functions. The coming advances in struc-tural and systems bioinformatics will make it pos-sible to translate genomic data into molecular and systems phenotypes, and to establish a causal link between genotype and disease that may ultimately be disrupted therapeutically.

Polypharmacology

  • The development of therapeutic drugs is currently centered on finding a “target,” typically a protein believed to play a causal role in a disease and whose activity is to be suppressed or enhanced. Much of the effort in medicinal chemistry is in find-ing drugs with a “clean” profile, i.e., ones that only affect their intended target while leaving all other proteins unperturbed. In the current era of one molecule one disease, this approach makes sense. However, as our understanding of the complex interactions underlying disease states improves,
  • therapeutic approaches will take on an increas-ingly polypharmacological bent, meaning they will by design target multiple molecules because the disease state is induced by multiple molecules. Furthermore, even when a single molecule is tar-geted, understanding the polypharmacology of a drug is important, as some lack of specificity may be more problematic than another. The integration of structural and systems approaches will play a crucial role in making designed polypharmacology a reality. By enabling the analysis and simulation of a drug’s molecular interaction with all proteins in a given pathway, its systems-level behavior can be predicted, and possibly designed. In addition, the information gained from a more sophisticated understanding of the basic science of disease will provide additional targets for drugs to act on.

Summary

The past decade has been an exciting time in bio-informatics and the life sciences broadly, as fun-damental breakthroughs in technology have made it possible to amass unparalleled amounts of data. The core challenges of this and upcoming decades will be the translation of such data into actionable knowledge, one that can improve human health and shed light on the principal mysteries of life.
  • Much as mathematics, particularly group theory and topology, played a critical role in the devel-opment of 20th century physics, computation and machine learning are playing an analogous role in the development of 21st century biology. And much as physics proved to be a constant source of dis-ruptive developments in the past century, it is likelythat the intersection of computation and biology will play a similarly disruptive role in this and up-coming decades.

Read more