Geraint Duck, Goran Nenadic, Andy
Brass, David L Robertson and Robert Stevens
Abstract
Background
Biology-focused
databases and software define bioinformatics and their use is central to
computational biology. In such a complex and dynamic field, it is of interest
to understand what resources are available, which are used, how much they are
used, and for what they are used. While scholarly literature surveys can
provide some insights, large-scale computer-based approaches to identify
mentions of bioinformatics databases and software from primary literature would
automate systematic cataloguing, facilitate the monitoring of usage, and
provide the foundations for the recovery of computational methods for analysing
biological data, with the long-term aim of identifying best/common practice in
different areas of biology.
Results
We
have developed bioNerDS, a named entity recogniser for the recovery of
bioinformatics databases and software from primary literature. We identify such
entities with an F-measure ranging from 63% to 91% at the mention level and
63-78% at the document level, depending on corpus. Not attaining a higher
F-measure is mostly due to high ambiguity in resource naming, which is
compounded by the on-going introduction of new resources. To demonstrate the
software, we applied bioNerDS to full-text articles from BMC Bioinformatics and
Genome Biology. General mention patterns reflect the remit of these journals,
highlighting BMC Bioinformatics's emphasis on new tools and Genome Biology's
greater emphasis on data analysis. The data also illustrates some shifts in
resource usage: for example, the past decade has seen R and the Gene Ontology
join BLAST and GenBank as the main components in bioinformatics processing.
Conclusions
We
demonstrate the feasibility of automatically identifying resource names on a
large-scale from the scientific literature and show that the generated data can
be used for exploration of bioinformatics database and software usage. For
example, our results help to investigate the rate of change in resource usage
and corroborate the suspicion that a vast majority of resources are created,
but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/.
The complete article is available as a provisional
PDF. The fully formatted PDF and HTML versions are in
production.
|
Post a Comment
Thanks for reading my blog.
Note: only a member of this blog may post a comment.