How to use the m6AConquer database

This guide aims at offering a quick-start on how to access the data sets provided by m6AConquer database. The results provided by m6AConquer consist two parts: m6A site calling results and IDR analysis-based integration result. We will show you how to load and extract data from the downloaded files sequentially.

m6A site calling results

m6AConquer provides both MultiAssayExperiment (multi-omics) and SummarizedExperiment (single-omics) files for sharing quantitative m6A data. The MultiAssayExperiment (MAE) files contain the SummarizedExperiement (SE) for both site summary and expression level. You are free to use any of them.

To access the site calling results in MAE file format, please download the corresponding R object from the MultiAssayExperiment column. As an illustration, here we use the site calling result from GLORI sequencing data. First, load the SummarizedExperiment and MultiAssayExperiment packages and read the file into R:

suppressWarnings(suppressPackageStartupMessages(library(SummarizedExperiment)))
suppressWarnings(suppressPackageStartupMessages(library(MultiAssayExperiment)))
MAE <- readRDS("GLORI_WT_MAE.rds")
MAE

## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes.
##  Containing an ExperimentList class object of length 2:
##  [1] m6AOmics: RangedSummarizedExperiment with 2788705 rows and 4 columns
##  [2] TranscriptOmics: RangedSummarizedExperiment with 58650 rows and 4 columns
## Functionality:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DataFrame
##  sampleMap() - the sample coordination DataFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DataFrame
##  assays() - convert ExperimentList to a SimpleList of matrices
##  exportClass() - save data to flat files

The MAE object contains both the site summary and expression levels.

experiments(MAE)

## ExperimentList class object of length 2:
##  [1] m6AOmics: RangedSummarizedExperiment with 2788705 rows and 4 columns
##  [2] TranscriptOmics: RangedSummarizedExperiment with 58650 rows and 4 columns

Assay Data

To access the m6A-Omics data, you may extract the m6AOmics experiment:

m6AOmics <- experiments(MAE)[["m6AOmics"]]
m6AOmics

## class: RangedSummarizedExperiment 
## dim: 2788705 4 
## metadata(1): OmixM6A_para
## assays(4): m6A Total m6ASiteProb AdjPvalue
## rownames: NULL
## rowData names(2): high_confidence_union high_confidence_specific
## colnames(4): SRR21356198 SRR21356199 SRR21356250 SRR21356251
## colData names(31): Status Title ... SRA gsm

This is basically a SummarizedExperiment object with four assay matrices:

m6A: m6A read coverages
Total: Total read coverages
m6ASiteProb: m6A site probabilities as normalized m6A levels
AdjPvalue: Benjamini-Hochberg method corrected p-values (FDR) returned by sample-specific site calling

If you have downloaded the SE objects from the corresponding columns, you can also use the codes on MAE experiments to extract data of interest.

Each of these assays can be access as follow:

## Here, we extract the sites with total m6A coverages across various samples larger than 30
index <- rowSums(assays(m6AOmics)[["m6A"]]) > 30
head(assays(m6AOmics)[["m6A"]][index,])

##      SRR21356198 SRR21356199 SRR21356250 SRR21356251
## [1,]           5           8          96         107
## [2,]          35          30         106          72
## [3,]          11          12          66          66
## [4,]           2           0          14          20
## [5,]           0           0          20          23
## [6,]           1           1          20          17

head(assays(m6AOmics)[["Total"]][index,])

##      SRR21356198 SRR21356199 SRR21356250 SRR21356251
## [1,]        1603        1280        3436        3292
## [2,]        1601        1272        3693        3502
## [3,]         853         746        5552        5315
## [4,]         146         104        1506        1498
## [5,]         156         109        1819        1758
## [6,]         155         108        2479        2437

head(assays(m6AOmics)[["m6ASiteProb"]][index,])

##      SRR21356198 SRR21356199 SRR21356250 SRR21356251
## [1,]  0.16747946  0.11371423  0.09302264  0.12464093
## [2,]  0.08064296  0.08052901  0.09851074  0.06021041
## [3,]  0.08606274  0.08173375  0.07641284  0.07309876
## [4,]  0.09286189  0.20339818  0.10589075  0.06977205
## [5,]  0.24320226  0.20720083  0.08483256  0.07063602
## [6,]  0.11517010  0.10382544  0.13426555  0.17830780

head(assays(m6AOmics)[["AdjPvalue"]][index,])

##      SRR21356198 SRR21356199 SRR21356250 SRR21356251
## [1,]           1           1   0.5864301   0.4063583
## [2,]           1           1   0.5439526   1.0000000
## [3,]           1           1   1.0000000   1.0000000
## [4,]           1           1   1.0000000   1.0000000
## [5,]           1           1   1.0000000   1.0000000
## [6,]           1           1   1.0000000   1.0000000

To access the expression levels of GLORI sequencing data, you may extract the TranscriptOmics experiment:

TranscriptOmics <- experiments(MAE)[["TranscriptOmics"]]
TranscriptOmics

## class: RangedSummarizedExperiment 
## dim: 58650 4 
## metadata(0):
## assays(2): ReadCounts RPKM
## rownames(58650): ENSG00000223972 ENSG00000227232 ... ENSG00000231514
##   ENSG00000235857
## rowData names(6): gene_id gene_name ... symbol entrezid
## colnames(4): SRR21356198 SRR21356199 SRR21356250 SRR21356251
## colData names(31): Status Title ... SRA gsm

This SE only contains two assay, which are the original read counts on all the genes and the normalized RPKM:

head(assays(TranscriptOmics)[["ReadCounts"]])

##                 SRR21356198 SRR21356199 SRR21356250 SRR21356251
## ENSG00000223972           1           3          44          32
## ENSG00000227232          43          39         556         510
## ENSG00000278267           0           0           0           0
## ENSG00000243485           1           0           2           1
## ENSG00000237613           1           0           0           0
## ENSG00000268020           0           0           0           2

head(assays(TranscriptOmics)[["RPKM"]])

##                 SRR21356198 SRR21356199 SRR21356250 SRR21356251
## ENSG00000223972  0.03563207   0.1269188 0.089399386 0.068082910
## ENSG00000227232  1.96767639   2.1189148 1.450777398 1.393485443
## ENSG00000278267  0.00000000   0.0000000 0.000000000 0.000000000
## ENSG00000243485  0.06055009   0.0000000 0.006905348 0.003615446
## ENSG00000237613  0.05071505   0.0000000 0.000000000 0.000000000
## ENSG00000268020  0.00000000   0.0000000 0.000000000 0.008788977

Rowrange data

To access the consistent genomic ranges used to count m6A and total read coverages, you can extract the Rowrange data from m6AOmics:

rowRanges(m6AOmics)

## GRanges object with 2788705 ranges and 2 metadata columns:
##             seqnames    ranges strand | high_confidence_union
##                <Rle> <IRanges>  <Rle> |             <numeric>
##         [1]        1     11873      + |                     0
##         [2]        1     11896      + |                     0
##         [3]        1     11940      + |                     0
##         [4]        1     12017      + |                     0
##         [5]        1     12147      + |                     0
##         ...      ...       ...    ... .                   ...
##   [2788701]       MT     14324      - |                     0
##   [2788702]       MT     15023      - |                     0
##   [2788703]       MT     15413      - |                     0
##   [2788704]       MT     15639      - |                     0
##   [2788705]       MT     15968      - |                     0
##             high_confidence_specific
##                            <numeric>
##         [1]                        0
##         [2]                        0
##         [3]                        0
##         [4]                        0
##         [5]                        0
##         ...                      ...
##   [2788701]                        0
##   [2788702]                        0
##   [2788703]                        0
##   [2788704]                        0
##   [2788705]                        0
##   -------
##   seqinfo: 25 sequences (1 circular) from GRCh38 genome

The metacolumn in the rowrange data of m6AOmics are the dummy variables indicating if the sites are supported by technical orthogonal integration. high_confidence_union indicates if the site is supported by integration of any orthogonal technique pair, namely high-confidence m6A sites. high_confidence_specific indicates if the site is supported by the integration of any orthogonal technique pair covering current technqiue.

To access the genomic locations of all the genes, you can extract the Rowrange from TranscriptOmics:

rowRanges(TranscriptOmics)

## GRanges object with 58650 ranges and 6 metadata columns:
##                   seqnames            ranges strand |         gene_id
##                      <Rle>         <IRanges>  <Rle> |     <character>
##   ENSG00000223972     chr1       11869-14409      + | ENSG00000223972
##   ENSG00000227232     chr1       14404-29570      - | ENSG00000227232
##   ENSG00000278267     chr1       17369-17436      - | ENSG00000278267
##   ENSG00000243485     chr1       29554-31109      + | ENSG00000243485
##   ENSG00000237613     chr1       34554-36081      - | ENSG00000237613
##               ...      ...               ...    ... .             ...
##   ENSG00000224240     chrY 26549425-26549743      + | ENSG00000224240
##   ENSG00000227629     chrY 26586642-26591601      - | ENSG00000227629
##   ENSG00000237917     chrY 26594851-26634652      - | ENSG00000237917
##   ENSG00000231514     chrY 26626520-26627159      - | ENSG00000231514
##   ENSG00000235857     chrY 56855244-56855488      + | ENSG00000235857
##                     gene_name           gene_biotype seq_coord_system
##                   <character>            <character>      <character>
##   ENSG00000223972     DDX11L1 transcribed_unproces..       chromosome
##   ENSG00000227232      WASH7P unprocessed_pseudogene       chromosome
##   ENSG00000278267   MIR6859-1                  miRNA       chromosome
##   ENSG00000243485   MIR1302-2                lincRNA       chromosome
##   ENSG00000237613     FAM138A                lincRNA       chromosome
##               ...         ...                    ...              ...
##   ENSG00000224240     CYCSP49   processed_pseudogene       chromosome
##   ENSG00000227629  SLC25A15P1 unprocessed_pseudogene       chromosome
##   ENSG00000237917     PARP4P1 unprocessed_pseudogene       chromosome
##   ENSG00000231514     FAM58CP   processed_pseudogene       chromosome
##   ENSG00000235857     CTBP2P1   processed_pseudogene       chromosome
##                        symbol                       entrezid
##                   <character>                         <list>
##   ENSG00000223972     DDX11L1 100287596,100287102,727856,...
##   ENSG00000227232      WASH7P                           <NA>
##   ENSG00000278267   MIR6859-1                      102466751
##   ENSG00000243485   MIR1302-2            105376912,100302278
##   ENSG00000237613     FAM138A           654835,645520,641702
##               ...         ...                            ...
##   ENSG00000224240     CYCSP49                           <NA>
##   ENSG00000227629  SLC25A15P1                           <NA>
##   ENSG00000237917     PARP4P1                           <NA>
##   ENSG00000231514     FAM58CP                           <NA>
##   ENSG00000235857     CTBP2P1                           <NA>
##   -------
##   seqinfo: 25 sequences (1 circular) from hg38 genome

Sample annotation

The sample annotations provided by data generators are also provided in MAE. For most of the sequencing techniques, the sample annotations for site summary SE and expression level SE are identical. For antibody-assisted techniques, the sample annotations for site summary are from IP samples, and those for expression levels are from input samples. If you access sample annotation from MAE object, you can obtain a complete sample annotation tables for both IP and input samples. Here, GLORI is a chemical-assisted techniques. Therefore, their sample annotations are the same.

head(colData(MAE))

## DataFrame with 4 rows and 31 columns
##                             Status                  Title Sample.type
##                        <character>            <character> <character>
## SRR21356198 Public on Aug 06, 2022 HEK293T_Takara_GLORI..         SRA
## SRR21356199 Public on Aug 06, 2022 HEK293T_Takara_GLORI..         SRA
## SRR21356250 Public on Aug 06, 2022    HEK293T_GLORI+_rep2         SRA
## SRR21356251 Public on Aug 06, 2022    HEK293T_GLORI+_rep1         SRA
##             Source.name     Organism        Characteristics
##             <character>  <character>            <character>
## SRR21356198     HEK293T Homo sapiens cell line: HEK293Tce..
## SRR21356199     HEK293T Homo sapiens cell line: HEK293Tce..
## SRR21356250     HEK293T Homo sapiens cell line: HEK293Tce..
## SRR21356251     HEK293T Homo sapiens cell line: HEK293Tce..
##                 Treatment.protocol        Growth.protocol Extracted.molecule
##                        <character>            <character>        <character>
## SRR21356198 Transfection was per.. HEK293T, MEF, HeLa c..          polyA RNA
## SRR21356199 Transfection was per.. HEK293T, MEF, HeLa c..          polyA RNA
## SRR21356250 Transfection was per.. HEK293T, MEF, HeLa c..          polyA RNA
## SRR21356251 Transfection was per.. HEK293T, MEF, HeLa c..          polyA RNA
##                Extraction.protocol Library.strategy Library.source
##                        <character>      <character>    <character>
## SRR21356198 mRNA was purified by..          RNA-Seq transcriptomic
## SRR21356199 mRNA was purified by..          RNA-Seq transcriptomic
## SRR21356250 mRNA was purified by..          RNA-Seq transcriptomic
## SRR21356251 mRNA was purified by..          RNA-Seq transcriptomic
##             Library.selection Instrument.model            Description
##                   <character>      <character>            <character>
## SRR21356198              cDNA      HiSeq X Ten wildtype HEK293T cel..
## SRR21356199              cDNA      HiSeq X Ten wildtype HEK293T cel..
## SRR21356250              cDNA      HiSeq X Ten wildtype HEK293T cel..
## SRR21356251              cDNA      HiSeq X Ten wildtype HEK293T cel..
##                    Data.processing Submission.date Last.update.date
##                        <character>     <character>      <character>
## SRR21356198 Adapters and low-qua..    Aug 04, 2022     Oct 05, 2022
## SRR21356199 Adapters and low-qua..    Aug 04, 2022     Oct 05, 2022
## SRR21356250 Adapters and low-qua..    Aug 04, 2022     Oct 05, 2022
## SRR21356251 Adapters and low-qua..    Aug 04, 2022     Oct 05, 2022
##             Contact.name         E.mail.s. Organization.name
##              <character>       <character>       <character>
## SRR21356198       Kai Li li_kai@pku.edu.cn Peking University
## SRR21356199       Kai Li li_kai@pku.edu.cn Peking University
## SRR21356250       Kai Li li_kai@pku.edu.cn Peking University
## SRR21356251       Kai Li li_kai@pku.edu.cn Peking University
##                         Department         Lab         Street.address
##                        <character> <character>            <character>
## SRR21356198 School of Life Science      Yi Lab Yi He Yuan Street No.5
## SRR21356199 School of Life Science      Yi Lab Yi He Yuan Street No.5
## SRR21356250 School of Life Science      Yi Lab Yi He Yuan Street No.5
## SRR21356251 School of Life Science      Yi Lab Yi He Yuan Street No.5
##                    City ZIP.Postal.code     Country Platform.ID    BioSample
##             <character>       <integer> <character> <character>  <character>
## SRR21356198     BeiJing          100871       China    GPL20795 SAMN30155964
## SRR21356199     BeiJing          100871       China    GPL20795 SAMN30155965
## SRR21356250     BeiJing          100871       China    GPL20795 SAMN30156016
## SRR21356251     BeiJing          100871       China    GPL20795 SAMN30156017
##                     SRA         gsm
##             <character> <character>
## SRR21356198 SRX17361992  GSM6432643
## SRR21356199 SRX17361991  GSM6432642
## SRR21356250 SRX17361940  GSM6432591
## SRR21356251 SRX17361939  GSM6432590

Metadata

Finally, the MultiAssayExperiment object also contains fitted parameters of the site calling models.

metadata(MAE)

## $OmixM6A_para
##             bg_proportion fg_proportion alpha_m6A_bg beta_m6A_bg alpha_m6A_fg
## SRR21356198     0.7143169     0.2856831     1.280619    45.11532    0.4055157
## SRR21356199     0.7185518     0.2814482     1.270012    43.87573    0.4051288
## SRR21356250     0.6212593     0.3787407     4.506799   224.56290    0.4858578
## SRR21356251     0.6275281     0.3724719     4.570400   222.54540    0.4880603
##             beta_m6A_fg
## SRR21356198    1.080324
## SRR21356199    1.072557
## SRR21356250    1.404342
## SRR21356251    1.402531

This can also be accessed through the metadata of site summary SE:

metadata(m6AOmics)

## $OmixM6A_para
##             bg_proportion fg_proportion alpha_m6A_bg beta_m6A_bg alpha_m6A_fg
## SRR21356198     0.7143169     0.2856831     1.280619    45.11532    0.4055157
## SRR21356199     0.7185518     0.2814482     1.270012    43.87573    0.4051288
## SRR21356250     0.6212593     0.3787407     4.506799   224.56290    0.4858578
## SRR21356251     0.6275281     0.3724719     4.570400   222.54540    0.4880603
##             beta_m6A_fg
## SRR21356198    1.080324
## SRR21356199    1.072557
## SRR21356250    1.404342
## SRR21356251    1.402531

IDR analysis-based integration result

m6AConquer provides both Genomic Ranges and CSV files for IDR analysis-based integration result. Both file types contain the same content. Currently, we only provide the individual and merged integration results across orthogonal techniques. You are free to use any of them.

For each technique pair, we also provide two kinds of integration results: positive reproducible sites and all reproducible sites.

Positive Reproducible Site: Includes reproducible sites with site calling FDR < 0.05
All Reproducible Site: Includes all reproducible sites (IDR < 0.05)

Genomic Ranges

To access the individual integration results in Genomic Ranges file format, please download the corresponding R object from the corresponding column. As an illustration, here we use the positive integration results between GLORI and eTAM-seq. First, load the GenomicRanges packages and read the file into R:

suppressWarnings(suppressPackageStartupMessages(library(GenomicRanges)))
gr <- readRDS("m6A_high_confidence_eTAM_seq&GLORI.rds")
head(gr)

## GRanges object with 6 ranges and 7 metadata columns:
##       seqnames    ranges strand | m6A_ratio_eTAM-seq   m6A_ratio_GLORI
##          <Rle> <IRanges>  <Rle> |        <character>       <character>
##   [1]    chr22  31279336      * |  0.525547445255474 0.394285714285714
##   [2]     chr2 169573658      * |  0.863636363636364 0.940594059405941
##   [3]    chr10  97456897      * |  0.760191846522782 0.805139186295503
##   [4]    chr12  45727259      * |   0.96969696969697  0.93859649122807
##   [5]     chr2  46759468      * |                  1 0.936708860759494
##   [6]     chr7   2730214      * |  0.521885521885522 0.617563739376771
##       m6A_probability_eTAM-seq m6A_probability_GLORI Pvalue_adjusted_eTAM-seq
##                    <character>           <character>              <character>
##   [1]        0.999999999996195                     1    -6.01961937441581e-11
##   [2]        0.999999999999598                     1    -1.53502357499078e-11
##   [3]                        1                     1     1.67802757017329e-11
##   [4]                        1                     1     -3.1630979718438e-11
##   [5]                        1                     1     2.41069862922253e-12
##   [6]        0.999999999999863                     1     1.02874903255979e-11
##       Pvalue_adjusted_GLORI Irreproducible_discovery_rate
##                 <character>                   <character>
##   [1]  1.05850158497544e-11          2.67563748934663e-14
##   [2]   -3.589161085134e-14          6.08402217494586e-14
##   [3]  7.29093266298641e-11          9.04831765069503e-14
##   [4]  2.21052712374319e-12          9.22595333463505e-14
##   [5] -5.70916743768332e-12          9.68114477473137e-14
##   [6]  7.56542394679467e-12          1.00586206031039e-13
##   -------
##   seqinfo: 25 sequences from an unspecified genome; no seqlengths

The Genomic Ranges file shows the genomic locations of all the positive reproducible sites identified by IDR between GLORI and eTAM-seq, with a total site number of 91979. The meta columns of the results contain the m6A methylation ratios, posterior probabilities, and BH corrected p-values modelled in each technique, along with the IDR value evaluated.

To access the merged integration results, you may load the merged genomic ranges file into R:

gr_merged <- readRDS("m6A_high_confidence_overall.rds")
head(gr_merged)

## GRanges object with 6 ranges and 2 metadata columns:
##       seqnames    ranges strand | support_number      support_technique
##          <Rle> <IRanges>  <Rle> |      <integer>            <character>
##   [1]     chr1    139472      * |              1         eTAM-seq&GLORI
##   [2]     chr1    185038      * |              2 GLORI&m6A-SAC-seq, G..
##   [3]     chr1    185233      * |              1        GLORI&m6ACE-seq
##   [4]     chr1    632756      * |              1         eTAM-seq&GLORI
##   [5]     chr1    826939      * |              2 eTAM-seq&GLORI, GLOR..
##   [6]     chr1    841507      * |              1        GLORI&m6ACE-seq
##   -------
##   seqinfo: 25 sequences from an unspecified genome; no seqlengths

The meta columns of the merged results contain the names and numbers of supported orthogonal technique pair. Totally, 146250 reproducible sites are identified across all orthogonal technique pairs.

CSV

To access the integration results through CSV tables, simply read the CSV tables into R:

csv <- read.csv("m6A_high_confidence_eTAM_seq&GLORI.csv", row.names = NULL)
head(csv)

##   seqnames     start       end width strand m6A_ratio_eTAM_seq m6A_ratio_GLORI
## 1    chr22  31279336  31279336     1      *          0.5255474       0.3942857
## 2     chr2 169573658 169573658     1      *          0.8636364       0.9405941
## 3    chr10  97456897  97456897     1      *          0.7601918       0.8051392
## 4    chr12  45727259  45727259     1      *          0.9696970       0.9385965
## 5     chr2  46759468  46759468     1      *          1.0000000       0.9367089
## 6     chr7   2730214   2730214     1      *          0.5218855       0.6175637
##   m6A_probability_eTAM_seq m6A_probability_GLORI Pvalue_adjusted_eTAM_seq
## 1                        1                     1            -6.019619e-11
## 2                        1                     1            -1.535024e-11
## 3                        1                     1             1.678028e-11
## 4                        1                     1            -3.163098e-11
## 5                        1                     1             2.410699e-12
## 6                        1                     1             1.028749e-11
##   Pvalue_adjusted_GLORI Irreproducible_discovery_rate
## 1          1.058502e-11                  2.675637e-14
## 2         -3.589161e-14                  6.084022e-14
## 3          7.290933e-11                  9.048318e-14
## 4          2.210527e-12                  9.225953e-14
## 5         -5.709167e-12                  9.681145e-14
## 6          7.565424e-12                  1.005862e-13

csv_merged <- read.csv("m6A_high_confidence_overall.csv", row.names = NULL)
head(csv_merged)

##   seqnames  start    end width strand support_number
## 1     chr1 139472 139472     1      *              1
## 2     chr1 185038 185038     1      *              2
## 3     chr1 185233 185233     1      *              1
## 4     chr1 632756 632756     1      *              1
## 5     chr1 826939 826939     1      *              2
## 6     chr1 841507 841507     1      *              1
##                    support_technique
## 1                     eTAM-seq&GLORI
## 2 GLORI&m6A-SAC-seq, GLORI&m6ACE-seq
## 3                    GLORI&m6ACE-seq
## 4                     eTAM-seq&GLORI
## 5    eTAM-seq&GLORI, GLORI&m6ACE-seq
## 6                    GLORI&m6ACE-seq