This guide aims at offering a quick-start on how to access the data sets provided by m6AConquer database. The results provided by m6AConquer consist two parts: m6A site calling results and IDR analysis-based integration result. We will show you how to load and extract data from the downloaded files sequentially.
m6AConquer provides both MultiAssayExperiment (multi-omics) and SummarizedExperiment (single-omics) files for sharing quantitative m6A data. The MultiAssayExperiment (MAE) files contain the SummarizedExperiement (SE) for both site summary and expression level. You are free to use any of them.
To access the site calling results in MAE file format, please download the corresponding R object from the MultiAssayExperiment column. As an illustration, here we use the site calling result from GLORI sequencing data. First, load the SummarizedExperiment and MultiAssayExperiment packages and read the file into R:
suppressWarnings(suppressPackageStartupMessages(library(SummarizedExperiment)))
suppressWarnings(suppressPackageStartupMessages(library(MultiAssayExperiment)))
MAE <- readRDS("GLORI_WT_MAE.rds")
MAE
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] m6AOmics: RangedSummarizedExperiment with 2788705 rows and 4 columns
## [2] TranscriptOmics: RangedSummarizedExperiment with 58650 rows and 4 columns
## Functionality:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample coordination DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
## exportClass() - save data to flat files
The MAE object contains both the site summary and expression levels.
experiments(MAE)
## ExperimentList class object of length 2:
## [1] m6AOmics: RangedSummarizedExperiment with 2788705 rows and 4 columns
## [2] TranscriptOmics: RangedSummarizedExperiment with 58650 rows and 4 columns
To access the m6A-Omics data, you may extract the
m6AOmics
experiment:
m6AOmics <- experiments(MAE)[["m6AOmics"]]
m6AOmics
## class: RangedSummarizedExperiment
## dim: 2788705 4
## metadata(1): OmixM6A_para
## assays(4): m6A Total m6ASiteProb AdjPvalue
## rownames: NULL
## rowData names(2): high_confidence_union high_confidence_specific
## colnames(4): SRR21356198 SRR21356199 SRR21356250 SRR21356251
## colData names(31): Status Title ... SRA gsm
This is basically a SummarizedExperiment object with four assay matrices:
m6A: m6A read coverages
Total: Total read coverages
m6ASiteProb: m6A site probabilities as normalized m6A levels
AdjPvalue: Benjamini-Hochberg method corrected p-values (FDR) returned by sample-specific site calling
If you have downloaded the SE objects from the corresponding columns, you can also use the codes on MAE experiments to extract data of interest.
Each of these assays can be access as follow:
## Here, we extract the sites with total m6A coverages across various samples larger than 30
index <- rowSums(assays(m6AOmics)[["m6A"]]) > 30
head(assays(m6AOmics)[["m6A"]][index,])
## SRR21356198 SRR21356199 SRR21356250 SRR21356251
## [1,] 5 8 96 107
## [2,] 35 30 106 72
## [3,] 11 12 66 66
## [4,] 2 0 14 20
## [5,] 0 0 20 23
## [6,] 1 1 20 17
head(assays(m6AOmics)[["Total"]][index,])
## SRR21356198 SRR21356199 SRR21356250 SRR21356251
## [1,] 1603 1280 3436 3292
## [2,] 1601 1272 3693 3502
## [3,] 853 746 5552 5315
## [4,] 146 104 1506 1498
## [5,] 156 109 1819 1758
## [6,] 155 108 2479 2437
head(assays(m6AOmics)[["m6ASiteProb"]][index,])
## SRR21356198 SRR21356199 SRR21356250 SRR21356251
## [1,] 0.16747946 0.11371423 0.09302264 0.12464093
## [2,] 0.08064296 0.08052901 0.09851074 0.06021041
## [3,] 0.08606274 0.08173375 0.07641284 0.07309876
## [4,] 0.09286189 0.20339818 0.10589075 0.06977205
## [5,] 0.24320226 0.20720083 0.08483256 0.07063602
## [6,] 0.11517010 0.10382544 0.13426555 0.17830780
head(assays(m6AOmics)[["AdjPvalue"]][index,])
## SRR21356198 SRR21356199 SRR21356250 SRR21356251
## [1,] 1 1 0.5864301 0.4063583
## [2,] 1 1 0.5439526 1.0000000
## [3,] 1 1 1.0000000 1.0000000
## [4,] 1 1 1.0000000 1.0000000
## [5,] 1 1 1.0000000 1.0000000
## [6,] 1 1 1.0000000 1.0000000
To access the expression levels of GLORI sequencing data, you may
extract the TranscriptOmics
experiment:
TranscriptOmics <- experiments(MAE)[["TranscriptOmics"]]
TranscriptOmics
## class: RangedSummarizedExperiment
## dim: 58650 4
## metadata(0):
## assays(2): ReadCounts RPKM
## rownames(58650): ENSG00000223972 ENSG00000227232 ... ENSG00000231514
## ENSG00000235857
## rowData names(6): gene_id gene_name ... symbol entrezid
## colnames(4): SRR21356198 SRR21356199 SRR21356250 SRR21356251
## colData names(31): Status Title ... SRA gsm
This SE only contains two assay, which are the original read counts on all the genes and the normalized RPKM:
head(assays(TranscriptOmics)[["ReadCounts"]])
## SRR21356198 SRR21356199 SRR21356250 SRR21356251
## ENSG00000223972 1 3 44 32
## ENSG00000227232 43 39 556 510
## ENSG00000278267 0 0 0 0
## ENSG00000243485 1 0 2 1
## ENSG00000237613 1 0 0 0
## ENSG00000268020 0 0 0 2
head(assays(TranscriptOmics)[["RPKM"]])
## SRR21356198 SRR21356199 SRR21356250 SRR21356251
## ENSG00000223972 0.03563207 0.1269188 0.089399386 0.068082910
## ENSG00000227232 1.96767639 2.1189148 1.450777398 1.393485443
## ENSG00000278267 0.00000000 0.0000000 0.000000000 0.000000000
## ENSG00000243485 0.06055009 0.0000000 0.006905348 0.003615446
## ENSG00000237613 0.05071505 0.0000000 0.000000000 0.000000000
## ENSG00000268020 0.00000000 0.0000000 0.000000000 0.008788977
To access the consistent genomic ranges used to count m6A and total
read coverages, you can extract the Rowrange data from
m6AOmics
:
rowRanges(m6AOmics)
## GRanges object with 2788705 ranges and 2 metadata columns:
## seqnames ranges strand | high_confidence_union
## <Rle> <IRanges> <Rle> | <numeric>
## [1] 1 11873 + | 0
## [2] 1 11896 + | 0
## [3] 1 11940 + | 0
## [4] 1 12017 + | 0
## [5] 1 12147 + | 0
## ... ... ... ... . ...
## [2788701] MT 14324 - | 0
## [2788702] MT 15023 - | 0
## [2788703] MT 15413 - | 0
## [2788704] MT 15639 - | 0
## [2788705] MT 15968 - | 0
## high_confidence_specific
## <numeric>
## [1] 0
## [2] 0
## [3] 0
## [4] 0
## [5] 0
## ... ...
## [2788701] 0
## [2788702] 0
## [2788703] 0
## [2788704] 0
## [2788705] 0
## -------
## seqinfo: 25 sequences (1 circular) from GRCh38 genome
The metacolumn in the rowrange data of m6AOmics
are the
dummy variables indicating if the sites are supported by technical
orthogonal integration. high_confidence_union
indicates if
the site is supported by integration of any orthogonal technique pair,
namely high-confidence m6A sites. high_confidence_specific
indicates if the site is supported by the integration of any orthogonal
technique pair covering current technqiue.
To access the genomic locations of all the genes, you can extract the
Rowrange from TranscriptOmics
:
rowRanges(TranscriptOmics)
## GRanges object with 58650 ranges and 6 metadata columns:
## seqnames ranges strand | gene_id
## <Rle> <IRanges> <Rle> | <character>
## ENSG00000223972 chr1 11869-14409 + | ENSG00000223972
## ENSG00000227232 chr1 14404-29570 - | ENSG00000227232
## ENSG00000278267 chr1 17369-17436 - | ENSG00000278267
## ENSG00000243485 chr1 29554-31109 + | ENSG00000243485
## ENSG00000237613 chr1 34554-36081 - | ENSG00000237613
## ... ... ... ... . ...
## ENSG00000224240 chrY 26549425-26549743 + | ENSG00000224240
## ENSG00000227629 chrY 26586642-26591601 - | ENSG00000227629
## ENSG00000237917 chrY 26594851-26634652 - | ENSG00000237917
## ENSG00000231514 chrY 26626520-26627159 - | ENSG00000231514
## ENSG00000235857 chrY 56855244-56855488 + | ENSG00000235857
## gene_name gene_biotype seq_coord_system
## <character> <character> <character>
## ENSG00000223972 DDX11L1 transcribed_unproces.. chromosome
## ENSG00000227232 WASH7P unprocessed_pseudogene chromosome
## ENSG00000278267 MIR6859-1 miRNA chromosome
## ENSG00000243485 MIR1302-2 lincRNA chromosome
## ENSG00000237613 FAM138A lincRNA chromosome
## ... ... ... ...
## ENSG00000224240 CYCSP49 processed_pseudogene chromosome
## ENSG00000227629 SLC25A15P1 unprocessed_pseudogene chromosome
## ENSG00000237917 PARP4P1 unprocessed_pseudogene chromosome
## ENSG00000231514 FAM58CP processed_pseudogene chromosome
## ENSG00000235857 CTBP2P1 processed_pseudogene chromosome
## symbol entrezid
## <character> <list>
## ENSG00000223972 DDX11L1 100287596,100287102,727856,...
## ENSG00000227232 WASH7P <NA>
## ENSG00000278267 MIR6859-1 102466751
## ENSG00000243485 MIR1302-2 105376912,100302278
## ENSG00000237613 FAM138A 654835,645520,641702
## ... ... ...
## ENSG00000224240 CYCSP49 <NA>
## ENSG00000227629 SLC25A15P1 <NA>
## ENSG00000237917 PARP4P1 <NA>
## ENSG00000231514 FAM58CP <NA>
## ENSG00000235857 CTBP2P1 <NA>
## -------
## seqinfo: 25 sequences (1 circular) from hg38 genome
The sample annotations provided by data generators are also provided in MAE. For most of the sequencing techniques, the sample annotations for site summary SE and expression level SE are identical. For antibody-assisted techniques, the sample annotations for site summary are from IP samples, and those for expression levels are from input samples. If you access sample annotation from MAE object, you can obtain a complete sample annotation tables for both IP and input samples. Here, GLORI is a chemical-assisted techniques. Therefore, their sample annotations are the same.
head(colData(MAE))
## DataFrame with 4 rows and 31 columns
## Status Title Sample.type
## <character> <character> <character>
## SRR21356198 Public on Aug 06, 2022 HEK293T_Takara_GLORI.. SRA
## SRR21356199 Public on Aug 06, 2022 HEK293T_Takara_GLORI.. SRA
## SRR21356250 Public on Aug 06, 2022 HEK293T_GLORI+_rep2 SRA
## SRR21356251 Public on Aug 06, 2022 HEK293T_GLORI+_rep1 SRA
## Source.name Organism Characteristics
## <character> <character> <character>
## SRR21356198 HEK293T Homo sapiens cell line: HEK293Tce..
## SRR21356199 HEK293T Homo sapiens cell line: HEK293Tce..
## SRR21356250 HEK293T Homo sapiens cell line: HEK293Tce..
## SRR21356251 HEK293T Homo sapiens cell line: HEK293Tce..
## Treatment.protocol Growth.protocol Extracted.molecule
## <character> <character> <character>
## SRR21356198 Transfection was per.. HEK293T, MEF, HeLa c.. polyA RNA
## SRR21356199 Transfection was per.. HEK293T, MEF, HeLa c.. polyA RNA
## SRR21356250 Transfection was per.. HEK293T, MEF, HeLa c.. polyA RNA
## SRR21356251 Transfection was per.. HEK293T, MEF, HeLa c.. polyA RNA
## Extraction.protocol Library.strategy Library.source
## <character> <character> <character>
## SRR21356198 mRNA was purified by.. RNA-Seq transcriptomic
## SRR21356199 mRNA was purified by.. RNA-Seq transcriptomic
## SRR21356250 mRNA was purified by.. RNA-Seq transcriptomic
## SRR21356251 mRNA was purified by.. RNA-Seq transcriptomic
## Library.selection Instrument.model Description
## <character> <character> <character>
## SRR21356198 cDNA HiSeq X Ten wildtype HEK293T cel..
## SRR21356199 cDNA HiSeq X Ten wildtype HEK293T cel..
## SRR21356250 cDNA HiSeq X Ten wildtype HEK293T cel..
## SRR21356251 cDNA HiSeq X Ten wildtype HEK293T cel..
## Data.processing Submission.date Last.update.date
## <character> <character> <character>
## SRR21356198 Adapters and low-qua.. Aug 04, 2022 Oct 05, 2022
## SRR21356199 Adapters and low-qua.. Aug 04, 2022 Oct 05, 2022
## SRR21356250 Adapters and low-qua.. Aug 04, 2022 Oct 05, 2022
## SRR21356251 Adapters and low-qua.. Aug 04, 2022 Oct 05, 2022
## Contact.name E.mail.s. Organization.name
## <character> <character> <character>
## SRR21356198 Kai Li li_kai@pku.edu.cn Peking University
## SRR21356199 Kai Li li_kai@pku.edu.cn Peking University
## SRR21356250 Kai Li li_kai@pku.edu.cn Peking University
## SRR21356251 Kai Li li_kai@pku.edu.cn Peking University
## Department Lab Street.address
## <character> <character> <character>
## SRR21356198 School of Life Science Yi Lab Yi He Yuan Street No.5
## SRR21356199 School of Life Science Yi Lab Yi He Yuan Street No.5
## SRR21356250 School of Life Science Yi Lab Yi He Yuan Street No.5
## SRR21356251 School of Life Science Yi Lab Yi He Yuan Street No.5
## City ZIP.Postal.code Country Platform.ID BioSample
## <character> <integer> <character> <character> <character>
## SRR21356198 BeiJing 100871 China GPL20795 SAMN30155964
## SRR21356199 BeiJing 100871 China GPL20795 SAMN30155965
## SRR21356250 BeiJing 100871 China GPL20795 SAMN30156016
## SRR21356251 BeiJing 100871 China GPL20795 SAMN30156017
## SRA gsm
## <character> <character>
## SRR21356198 SRX17361992 GSM6432643
## SRR21356199 SRX17361991 GSM6432642
## SRR21356250 SRX17361940 GSM6432591
## SRR21356251 SRX17361939 GSM6432590
Finally, the MultiAssayExperiment object also contains fitted parameters of the site calling models.
metadata(MAE)
## $OmixM6A_para
## bg_proportion fg_proportion alpha_m6A_bg beta_m6A_bg alpha_m6A_fg
## SRR21356198 0.7143169 0.2856831 1.280619 45.11532 0.4055157
## SRR21356199 0.7185518 0.2814482 1.270012 43.87573 0.4051288
## SRR21356250 0.6212593 0.3787407 4.506799 224.56290 0.4858578
## SRR21356251 0.6275281 0.3724719 4.570400 222.54540 0.4880603
## beta_m6A_fg
## SRR21356198 1.080324
## SRR21356199 1.072557
## SRR21356250 1.404342
## SRR21356251 1.402531
This can also be accessed through the metadata of site summary SE:
metadata(m6AOmics)
## $OmixM6A_para
## bg_proportion fg_proportion alpha_m6A_bg beta_m6A_bg alpha_m6A_fg
## SRR21356198 0.7143169 0.2856831 1.280619 45.11532 0.4055157
## SRR21356199 0.7185518 0.2814482 1.270012 43.87573 0.4051288
## SRR21356250 0.6212593 0.3787407 4.506799 224.56290 0.4858578
## SRR21356251 0.6275281 0.3724719 4.570400 222.54540 0.4880603
## beta_m6A_fg
## SRR21356198 1.080324
## SRR21356199 1.072557
## SRR21356250 1.404342
## SRR21356251 1.402531
m6AConquer provides both Genomic Ranges and CSV files for IDR analysis-based integration result. Both file types contain the same content. Currently, we only provide the individual and merged integration results across orthogonal techniques. You are free to use any of them.
For each technique pair, we also provide two kinds of integration results: positive reproducible sites and all reproducible sites.
Positive Reproducible Site: Includes reproducible sites with site calling FDR < 0.05
All Reproducible Site: Includes all reproducible sites (IDR < 0.05)
To access the individual integration results in Genomic Ranges file format, please download the corresponding R object from the corresponding column. As an illustration, here we use the positive integration results between GLORI and eTAM-seq. First, load the GenomicRanges packages and read the file into R:
suppressWarnings(suppressPackageStartupMessages(library(GenomicRanges)))
gr <- readRDS("m6A_high_confidence_eTAM_seq&GLORI.rds")
head(gr)
## GRanges object with 6 ranges and 7 metadata columns:
## seqnames ranges strand | m6A_ratio_eTAM-seq m6A_ratio_GLORI
## <Rle> <IRanges> <Rle> | <character> <character>
## [1] chr22 31279336 * | 0.525547445255474 0.394285714285714
## [2] chr2 169573658 * | 0.863636363636364 0.940594059405941
## [3] chr10 97456897 * | 0.760191846522782 0.805139186295503
## [4] chr12 45727259 * | 0.96969696969697 0.93859649122807
## [5] chr2 46759468 * | 1 0.936708860759494
## [6] chr7 2730214 * | 0.521885521885522 0.617563739376771
## m6A_probability_eTAM-seq m6A_probability_GLORI Pvalue_adjusted_eTAM-seq
## <character> <character> <character>
## [1] 0.999999999996195 1 -6.01961937441581e-11
## [2] 0.999999999999598 1 -1.53502357499078e-11
## [3] 1 1 1.67802757017329e-11
## [4] 1 1 -3.1630979718438e-11
## [5] 1 1 2.41069862922253e-12
## [6] 0.999999999999863 1 1.02874903255979e-11
## Pvalue_adjusted_GLORI Irreproducible_discovery_rate
## <character> <character>
## [1] 1.05850158497544e-11 2.67563748934663e-14
## [2] -3.589161085134e-14 6.08402217494586e-14
## [3] 7.29093266298641e-11 9.04831765069503e-14
## [4] 2.21052712374319e-12 9.22595333463505e-14
## [5] -5.70916743768332e-12 9.68114477473137e-14
## [6] 7.56542394679467e-12 1.00586206031039e-13
## -------
## seqinfo: 25 sequences from an unspecified genome; no seqlengths
The Genomic Ranges file shows the genomic locations of all the positive reproducible sites identified by IDR between GLORI and eTAM-seq, with a total site number of 91979. The meta columns of the results contain the m6A methylation ratios, posterior probabilities, and BH corrected p-values modelled in each technique, along with the IDR value evaluated.
To access the merged integration results, you may load the merged genomic ranges file into R:
gr_merged <- readRDS("m6A_high_confidence_overall.rds")
head(gr_merged)
## GRanges object with 6 ranges and 2 metadata columns:
## seqnames ranges strand | support_number support_technique
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr1 139472 * | 1 eTAM-seq&GLORI
## [2] chr1 185038 * | 2 GLORI&m6A-SAC-seq, G..
## [3] chr1 185233 * | 1 GLORI&m6ACE-seq
## [4] chr1 632756 * | 1 eTAM-seq&GLORI
## [5] chr1 826939 * | 2 eTAM-seq&GLORI, GLOR..
## [6] chr1 841507 * | 1 GLORI&m6ACE-seq
## -------
## seqinfo: 25 sequences from an unspecified genome; no seqlengths
The meta columns of the merged results contain the names and numbers of supported orthogonal technique pair. Totally, 146250 reproducible sites are identified across all orthogonal technique pairs.
To access the integration results through CSV tables, simply read the CSV tables into R:
csv <- read.csv("m6A_high_confidence_eTAM_seq&GLORI.csv", row.names = NULL)
head(csv)
## seqnames start end width strand m6A_ratio_eTAM_seq m6A_ratio_GLORI
## 1 chr22 31279336 31279336 1 * 0.5255474 0.3942857
## 2 chr2 169573658 169573658 1 * 0.8636364 0.9405941
## 3 chr10 97456897 97456897 1 * 0.7601918 0.8051392
## 4 chr12 45727259 45727259 1 * 0.9696970 0.9385965
## 5 chr2 46759468 46759468 1 * 1.0000000 0.9367089
## 6 chr7 2730214 2730214 1 * 0.5218855 0.6175637
## m6A_probability_eTAM_seq m6A_probability_GLORI Pvalue_adjusted_eTAM_seq
## 1 1 1 -6.019619e-11
## 2 1 1 -1.535024e-11
## 3 1 1 1.678028e-11
## 4 1 1 -3.163098e-11
## 5 1 1 2.410699e-12
## 6 1 1 1.028749e-11
## Pvalue_adjusted_GLORI Irreproducible_discovery_rate
## 1 1.058502e-11 2.675637e-14
## 2 -3.589161e-14 6.084022e-14
## 3 7.290933e-11 9.048318e-14
## 4 2.210527e-12 9.225953e-14
## 5 -5.709167e-12 9.681145e-14
## 6 7.565424e-12 1.005862e-13
csv_merged <- read.csv("m6A_high_confidence_overall.csv", row.names = NULL)
head(csv_merged)
## seqnames start end width strand support_number
## 1 chr1 139472 139472 1 * 1
## 2 chr1 185038 185038 1 * 2
## 3 chr1 185233 185233 1 * 1
## 4 chr1 632756 632756 1 * 1
## 5 chr1 826939 826939 1 * 2
## 6 chr1 841507 841507 1 * 1
## support_technique
## 1 eTAM-seq&GLORI
## 2 GLORI&m6A-SAC-seq, GLORI&m6ACE-seq
## 3 GLORI&m6ACE-seq
## 4 eTAM-seq&GLORI
## 5 eTAM-seq&GLORI, GLORI&m6ACE-seq
## 6 GLORI&m6ACE-seq