Preparing input files from raw data formats¶

If you have a raw data format (FASTQ, mzML, mzML, RAW, WIFF, etc..) or want to perform additional pre-processing steps (normalization or filtering), in this section we show tutorials on how to perform such steps. We also show how to prepare data files from public databases such as GEO.

We provide four types of example cases to guide users for preparing their input objects and injecting it into the platform. Basically, the example cases illustrate how to prepare an input data:

from FASTQ files,
from gene counts table or from the GEO repository,
from single-cell data,
from LC-MS/MS proteomics data.

All the necessary scripts for data cleaning and preprocessing examples can be found under the scripts/ folder.

From FASTQ files¶

Starting from FASTQ files, we recommend using the GREP2 package to obtain gene counts through quality control, trimming, quantification of gene abundance, and so on. Afterwards, the user can refer to the examples in the next section for preparing an input data from the gene counts.

From gene counts table or GEO repository¶

Users can prepare an input data from their own gene counts or download a relevant dataset from repositories such as GEO. Some examples are provided in the following scripts:

TCGA-BRCA: pgx-tcga-brca.R
TCGA-PRAD: pgx-tcga-prad.R
GSE10846: pgx-GSE10846-dlbcl.R
GSE114716: pgx-GSE114716-ipilimumab.R
GSE22886: pgx-GSE22886-immune.R
GSE28492: pgx-GSE28492-roche.R
GSE32591: pgx-GSE32591-lupusnephritis.R
GSE53784: pgx-GSE53784-wnvjev.R
GSE88808: pgx-GSE88808-prostate.R

From single-cell data¶

Single-cell RNA sequencing experiments have been valuable to provide insights into complex biological systems, reveal complex and rare cell populations, uncover relationships between genes, and track the trajectories of cell lineages. Below we provide some data preparation examples from single-cell experiments:

GSE72056: pgx-GSE72056-scmelanoma.R
GSE92332: pgx-GSE92332-scintestine.R
GSE98638: pgx-GSE98638-scliver.R

From LC-MS/MS proteomics data¶

Two examples are provided below for LC-MS/MS proteomics data preprocessing:

Geiger et al. 2016: pgx-geiger2016-arginine.R
Rieckmann et al. 2017: pgx-rieckmann2017-immprot.R