lncRNAtor: a comprehensive resource

RNCATOR TOOL
lncRNAtor aims to be the lncRNA (long non-coding RNA) portal encompassing expression profile, interacting (binding) protein, integrated sequence curation, evolutionary scores, and coding potential. Data sets were collected from TCGA, GEO, ENCODE, and modENCODE (Organism: Human, Mouse, Fly, Worm, and Yeast). Our system workflow is shown below.
image

Major feature:

Representative features of lncRNAtor are summarized as follows:

Annotation and sequence analysis:

We have collected lncRNAs for 6 species(human, mouse, zebrafish, fruit fly, worm, yeast) from ENSEMBL, HGNC, MGI, and lncRNAdb. Each lncRNA was analyzed for phylogenetic conservation and coding potential. Lineage-specific conservation scores were obtained using UCSC PhastCon data to identify functional lncRNAs in specific lineages only.

Expression analysis:

RNA-Seq data provide expression of lncRNAs and mRNAs in unbiased manner. We have collected 208 RNA-Seq studies (6295 samples) from GEO, ENCODE, modENCODE, and TCGA databases. Each sample was classified into tissues, diseases, drugs, or developmental stages according to the study design. Cufflinks was used to quantify the expression level of lncRNAs and mRNAs, and Cuffdiff was used to identify differentially expressed lncRNAs. Theexpression profiles and differential expression are visualized as heatmaps or box plots, respectively.

Functional analysis:

Molecular functions of most lncRNAs remain to be uncovered yet. Protein-coding mRNAs that are coexpressed with lncRNA of interest often provide ample insight into functions. By analyzing RNA-Seq data, we provide the list of coexpressed mRNAs in various conditions. This list was further utilized for GO (gene ontology) or pathway analysis that provide functional insights. All these processes are seamlessly integrated in the web service.

Protein-binding analysis:

Another important characteristic of lncRNA function is protein-lncRNA interaction. We compiled all publicly available CLIP-Seq or PAR-CLIP sequencing experiments in GEO database. Even though the size of current dataset was rather limited (57 RNA-binding proteins and 280 samples), our data included many important RNA-binding proteins such as AGO, LIN28, hnRNPs, and HuR (ELAVL1).

Identification of HMZ conserved and coexpressed lncRNAs:

To demonstrate the utility of our functional analysis, we searched for lncRNAs that were conserved between orthologous genomes and that the expression patterns in orthologs were highly correlated. These lncRNAs are evolutionarily conserved in terms of sequences as well as expression patterns, thus being expected to play important biological roles. Three organisms of HMZ (human, mouse, zebrafish) were chosen due to availability of lncRNA expression data in many tissues. We obtained ~500 and ~200 such lncRNAs conserved between human-mouse and human-zebrafish orthologs, respectively.

Read more

lncRNAtor aims to be the lncRNA (long non-coding RNA) portal encompassing expression profile, interacting (binding) protein, integrated sequence curation, evolutionary scores, and coding potential. Data sets were collected from TCGA, GEO, ENCODE, and modENCODE (Organism: Human, Mouse, Fly, Worm, and Yeast).