If you want to use EOA online (this site) and your data is less than 30GB, you have to upload your data to our server.
The bottleneck is the slow data uploading.
Therefore, you are encouraged to use the EOA local version
here,
which can be easily installed in your local computer with a docker image.
This will save your tremendous time for data uploading. More details on how to use the local version are given
here.
Yes. The data files you upload for analysis as well as any analysis results, are not downloaded or examined in any way by
the administrators, unless required for system maintenance and troubleshooting.
All raw data files will be deleted automatically after 30 days,
while the result files will be kept until you delete your project or user account; and no archives or backups are kept.
You are advised to perform the analysis immediately after uploading all your samples.
EcoOmicsAnalyst accepts raw RNA-seq data and does not require pre-processing such as quality control, error correction, ploy(A) tail removing.
The raw RNA-seq data can be paired- or single- end reads and must be compressed with file extension of .fastq.gz/.fq.gz.
For organisms with transcriptome references:
EcoOmicsAnalyst employs Fastp (click here) for raw reads quality check,
before submitting to Kallisto (click here) for ultra-fast pseudoalignment gene quantification.
For organisms without transcriptome references:
EcoOmicsAnalyst employs Seq2Fun (click here) (Peng Liu, et al. (2021) “Ultrafast functional profiling of RNA-seq data for nonmodel organisms” Genome Research),
which is a novel and high-performance RNA-seq data quantification tool for non-model organisms without genome references, to do ortholog quantification.
(see details in Data Processing and/or (click here)).
To use the EcoOmicsAnalyst RNA-seq module, you are required to register via a valid email address
(click here).
Trying our example data is registration-free.
Due to the large size of RNA-seq data, you are only allowed to upload your raw RNA-seq data via FTP Clients tools such as
FileZilla.
Please click here to download FileZilla and install it locally.
Then connect with our server using Host: dev.ecoomicsanalyst.ca and your registered email and password with port 21.
Click File -> Site Manager -> New site -> Fill the fields of Host, Port, User and Password.
Click Transfer Settings -> select Active -> click Connect.
Drag all your files in local to our server.
Please monitor your file uploading in the FileZilla. Once all your files are successfully uploaded and please login again to process.
Once your job is submitted, please do not remove or upload any files in FileZilla.
Please check more details in our TUTORIALS (click here).
Note: your storage quota is 30GB, and your raw data will be kept for 30 days.
If you have a data set more than 30GB, please split them, upload and run, then delete files in storage from FileZilla before you can submit new dataset.
There is no limitation of number of samples allowed to be upload, but the storage for your samples is limited to 30 GB. To process
datasets larger than 30GB, we recommend splitting the samples into multiple groups, deleting each set of files from your account after
they have been processed.
This usually means that you tried to upload a file that is not in compressed FASTQ format. This will be the
case if the filenames do not end in ".fastq.gz". Uncompressed FASTQ files are extremely large and will take too
long to upload to the EOA server.
Data upload is typically the most challenging step of using EOA, since FASTQ files are extremely large and must be
uploaded over a network connection. The upload time will depend on both your file size and your network speed. In
our own tests on a laptop and home wifi network, it took ~30 minutes to upload one FASTQ file that was ~2.5GB. So, a
full dataset could take between 5-10 hours to upload.
Maintaining a constant network connection for this long could be challenging and could take up internet bandwidth, so we
suggest starting file upload in the evening. Internet speed is usually fastest at night, and files will be ready to process
in the morning.
EcoOmicsAnalyst has 3 main steps to process raw RNA-seq data.
Raw reads quality control, including adopter detection removal, low quality reads and bases removal, error correction.
The raw reads will be firstly processed with
Fastp
to automatically detect and remove adapters, remove low quality and too short reads, trim low quality bases,
correct sequencing error of overlapped paired-end reads region.
Reads alignment.
The clean reads will be submitted to super-fast, highly accurate pseudoaligners
Kallisto
for reads alignment and annotation.
A gene count table will be generated for each sample.
Summarize results into gene abundance tables, figures.
The results generated by aligner will be submitted to R package
Tximport
to tidy into a gene abundance table (gene X sample) for all samples, which is ready to submitted to
ExpressAnalyst
for comprehensive downstream analysis and visualization. Besides the gene abundance table, various informative figures and tables are also generated.
For example, principle component analysis (PCA) shows similarities among your samples. Rarefaction curve shows the sequence depth
and how many genes the sequences can cover. Reads quality plot shows the reads quality before and after quality control.
A table summarizes the number of raw and clean reads, number of mapped reads and genes.
This depends on your dataset and your goals. If you are only analyzing data from one species and the reference
transcriptome is well-annotated, you likely want to use Kallisto because it includes more features, including
non-coding sequences and gene isoforms. However, if you are analyzing data from a species with a sparsely
annotated reference, you may want Seq2Fun for better functional analysis. Also, if you are analyzing data from
multiple species, even if one or several have references, you may want to use Seq2Fun since this makes
integrating the data and comparing results across species very easy.
We currently provide databases for ~30 organism groups. Please
check here
for more details on how Seq2Fun works. All databases can be downloaded
here.
Note: the ortholog includes all genes (including genes are not orthology with any other genes) from that groups of organisms.
Groups of organisms can be download from (here).
The definition of a core ortholog is that its frequency in the group >= 0.90. This is in consistent with BUSCO
Note: * frequency >= 0.85; ** frequency >= 0.80;
EcoOmicsAnalyst has 3 main steps to process raw RNA-sq data for species without a reference genome. The following steps
are all done using the Seq2Fun software.
Raw reads quality control, including adopter detection removal, low quality reads and bases removal, error correction.
Raw Reads quality control will remove low quality and too short reads; trim low quality bases;
remove sequencing adapters, ploy(A) tails, low complex reads;
perform error correction for overlapped region of paired-end reads and join the overlapped paired-end reads.
Clean reads alignment via translated search in a protein ortholog database.
Each clean read will be translated into all possible amino acid sequences using the six reading-frames and the top
longest ones will be used to identify its homology sequence in the protein database.
Summarize results into gene abundance tables, figures. The gene/ortholog abundance table is generated by summarizing
all reads that are mapped to the same ortholog group.
EcoOmicsAnalyst uses version 2.0.2 of Seq2Fun. All previous versions of Seq2Fun are still available for
commandline use. Go to www.seq2fun.ca to download and install previous versions.
The main output from EOA is an abundance table that gives the reads counts aligned to each feature for each sample. The tables are
structured with sample names in the first row, class labels in the second row, and then counts of features in all following rows.
In addition, EOA produces some plots and a summary report for basic QA/QC. All results files are contained in the Download.zip file.
All abundance tables from EcoOmicsAnalyst can be submitted to
ExpressAnalyst ,
another web-tool developed by our team for statistical and functional analysis of transcriptomics data.
Select Specify organism -> choose the species that you used for your EOA analysis
Select Data type -> Bulk RNA-seq data (counts)
Select ID type -> Check the table in the "Analysis With Reference" FAQ section to see which ID type is outputted for the transcriptome that you used in the analysis
Select Gene-level summarization -> Sum
Data File -> Choose the All_samples_salmon_txi_abundance_slm.txt file