Preparing input files from raw data formats

If you have a raw data format (FASTQ, mzML, mzML, RAW, WIFF, etc..) or want to perform additional pre-processing steps (normalization or filtering), in this section we show tutorials on how to perform such steps. We also show how to prepare data files from public databases such as GEO.

We provide four types of example cases to guide users for preparing their input objects and injecting it into the platform. Basically, the example cases illustrate how to prepare an input data:

  1. from FASTQ files,

  2. from gene counts table or from the GEO repository,

  3. from single-cell data,

  4. from LC-MS/MS proteomics data.

All the necessary scripts for data cleaning and preprocessing examples can be found under the scripts/ folder.

From FASTQ files

Starting from FASTQ files, we recommend using the GREP2 package to obtain gene counts through quality control, trimming, quantification of gene abundance, and so on. Afterwards, the user can refer to the examples in the next section for preparing an input data from the gene counts.

From gene counts table or GEO repository

Users can prepare an input data from their own gene counts or download a relevant dataset from repositories such as GEO. Some examples are provided in the following scripts:

From single-cell data

Single-cell RNA sequencing experiments have been valuable to provide insights into complex biological systems, reveal complex and rare cell populations, uncover relationships between genes, and track the trajectories of cell lineages. Below we provide some data preparation examples from single-cell experiments:

From LC-MS/MS proteomics data

Two examples are provided below for LC-MS/MS proteomics data preprocessing: