The most cited Bioinformatics papers of all time

The discovery of high-temperature superconductors, the determination of DNA’s double-helix structure, the first observations that the expansion of the Universe is accelerating — all of these breakthroughs won Nobel prizes and international acclaim. Yet none of the papers that announced them comes anywhere close to ranking among the 100 most highly cited papers of all time.
Citations, in which one paper refers to earlier works, are the standard means by which authors acknowledge the source of their methods, ideas and findings, and are often used as a rough measure of a paper’s importance.
  • Fifty years ago, Eugene Garfield published the Science Citation Index (SCI), the first systematic effort to track citations in the scientific literature. To mark the anniversary, Nature asked Thomson Reuters, which now owns the SCI, to list the 100 most highly cited papers of all time. (See the full list at Web of Science Top 100.xls or the interactive graphic, below.) The search covered all of Thomson Reuter’s Web of Science, an online version of the SCI that also includes databases covering the social sciences, arts and humanities, conference proceedings and some books. It lists papers published from 1900 to the present day.

The paper mountain


  • The rapid expansion of genetic sequencing since Sanger’s contribution has helped to boost the ranking of papers describing ways to analyse the sequences. A prime example is BLAST (Basic Local Alignment Search Tool), which for two decades has been a household name for biologists wanting to work out what genes and proteins do. Users simply have to open the program in a web browser and plug in a DNA, RNA or protein sequence. Within seconds, they will be shown related sequences from thousands of organisms — along with information about the function of those sequences and even links to relevant literature. So popular is BLAST that versions8, 9 of the program feature twice on the list, at spots 12 and 14.
But owing to the vagaries of citation habits, BLAST has been bumped down the list by Clustal, a complementary programme for aligning multiple sequences at once. Clustal allows researchers to describe the evolutionary relationships between sequences from different organisms, to find matches among seemingly unrelated sequences and to predict how a change at a specific point in a gene or protein might affect its function. A 1994 paper10 describing ClustalW, a user-friendly version of the software, is currently number 10 on the list. A 1997 paper11on a later version called ClustalX is number 28.
  • The team that developed ClustalW, at the European Molecular Biology Laboratory in Heidelberg, Germany, had created the program to work on a personal computer, rather than a mainframe. But the software was transformed when Julie Thompson, a computer scientist from the private sector, joined the lab in 1991. “It was a program written by biologists; I’m trying to find a nice way to say that,” says Thompson, who is now at the Institute of Genetics and Molecular and Cellular Biology in Strasbourg, France. Thompson rewrote the program to ready it for the volume and complexity of the genome data being generated at the time, while also making it easier to use.
The teams behind BLAST and Clustal are competitive about the ranking of their papers. It is a friendly sort of competition, however, says Des Higgins, a biologist at University College Dublin, and a member of the Clustal team. “BLAST was a game-changer, and they’ve earned every citation that they get.”
  • Here Nature tours some of the key methods that tens of thousands of citations have hoisted to the top of science’s Kilimanjaro — essential, but rarely thrust into the limelight.

The most cited bioinformatics papers of all time:

Rank: 10 Citations: 40,289
Clustal W:
improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Thompson, J. D., Higgins, D. G. & Gibson, T. J
Nucleic Acids Res. 22, 4673–4680 (1994).
Rank: 12 Citations: 38,380
Basic local alignment search tool.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J.
J. Mol. Biol. 215, 403–410 (1990).
Rank: 14 Citations: 36,410
Gapped BLAST and PSI-BLAST: A new generation of protein database search programs.
Altschul, S. F. et al.
Nucleic Acids Res. 25, 3389–3402 (1997).
Rank: 20 Citations: 30,176
The neighbor-joining method: A new method for reconstructing phylogenetic trees.
Saitou, N. & Nei, M.
Mol. Biol. Evol. 4, 406–425 (1987).
Rank: 28 Citations: 24,098
The CLUSTAL_X Windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools.
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G.
Nucleic Acids Res. 25, 4876–4882 (1997).
Rank: 41 Citations: 21,373
Confidence limits on phylogenies: an approach using the bootstrap
Felsenstein, J.
Evolution 39, 783–791 (1985).
Rank: 45 Citations: 18,286
Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Tamura, K., Dudley, J., Nei, M. & Kumar, S.
Mol. Biol. Evol. 24, 1596–1599 (2007).
Rank: 57 Citations: 15,993
Maximum likelihood from incomplete data via EM algorithm.
Dempster, A. P., Laird, N. M. & Rubin, D. B.
J. R. Stat. Soc., B 39, 1–38 (1977).
Rank: 71 Citations: 14,462
a program to check the stereochemical quality of protein structures. Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M.
J. Appl. Crystallogr. 26, 283–291 (1993).
Rank: 76 Citations: 14,099
Testing the model of DNA. Posada, D. & Crandall, K. A.
Bioinformatics 14, 817–818 (1998).
Rank: 82 Citations: 13,496
a program to produce both detailed and schematic plots of protein structures. Kraulis, P. J.
J. Appl. Crystallogr. 24, 946–950 (1991).


George Sheldrick, a chemist at the University of Göttingen in Germany, began to write software to help solve crystal structures in the 1970s. In those days, he says, “you couldn’t get grant money for that kind of project. My job was to teach chemistry, and I wrote the programs as a hobby in my spare time.” But over 40 years, his work gave rise to the regularly updated SHELX suite of computer programs, which has become one of the most popular tools for analysing the scattering patterns of X-rays that are shot through a crystal — thereby revealing the atomic structure.
  • The extent of that popularity became apparent after 2008, when Sheldrick published a review paper24 about the history of the system, and noted that it might serve as a general literature citation whenever any of the SHELX programs were used. Readers followed his advice. In the past 6 years, that review paper has amassed almost 38,000 citations, catapulting it to number 13 and making it the highest-ranked paper published in the past two decades.
  • The top-100 list is scattered with other tools essential to crystallography and structural biology. These include papers describing the HKL suite25 (number 23) for analysing X-ray diffraction data; the PROCHECK programs26 (number 71) used to analyse whether a proposed protein structure seems geometrically normal or outlandish; and two programs27, 28 used to sketch molecular structures (numbers 82 and 95). These tools are the “bricks and mortar” for determining crystal structures, says Philip Bourne, associate director for data science at the US National Institutes of Health in Bethesda, Maryland.
  • An unusual entry, appearing at number 22, is a 1976 paper29 from Robert Shannon — a researcher at the giant chemical firm DuPont in Wilmington, Delaware, who compiled a comprehensive list of the radii of ions in a series of different materials. Robin Grimes, a materials scientist at Imperial College London, says that physicists, chemists and theorists still cite this paper when they look up values of ionic size, which often correlate neatly with other properties of a substance. This has made it the highest formally-cited database of all time.
“We often cite these kinds of papers almost without thinking about it,” says Paul Fossati, one of Grimes’s research colleagues. The same could be said for many of the methods and databases in the top 100. The list reveals just how powerfully research has been affected by computation and the analysis of large data sets. But it also serves as a reminder that the position of any particular methods paper or database at the top of the citation charts is also down to luck and circumstance.
Still, there is one powerful lesson for researchers, notes Peter Moore, a chemist at Yale University in New Haven, Connecticut. “If citations are what you want,” he says, “devising a method that makes it possible for people to do the experiments they want at all, or more easily, will get you a lot further than, say, discovering the secret of the Universe”.

Read Full Article @

The discovery of high-temperature superconductors, the determination of DNA’s double-helix structure, the first observations that the expansion of the Universe is accelerating — all of these breakthroughs won Nobel prizes and international acclaim. Yet none of the papers that announced them comes anywhere close to ranking among the 100 most highly cited papers of all time.