Counts file

The file ‘counts’ contains the measurements (genes, proteins, etc..) for each sample listed in the samples file. Just like the samples, the counts.csv file is tabular (.csv), where each row describes the features (genes, proteins, etc..) and each column describes the samples.

The rows contains gene IDs, which can be in most common formats (such as HGCN or Ensembl), but not in the Entrez number format. If you are using Entrez numbers, please convert them to Ensembl IDs using tools such as Syngo.

The values should always be numerical, with the exception of “NA” in case of a lack of data. Failure to do so will result in an error.

Below is a simple example of how a counts.csv file should look like.

sample1

sample2

sample3

sample4

sample5

gene1

543.6

1556.1

413.0

887.9

123.4

gene2

6.5

14.7

2.3

42.4

56.7

gene3

10.4

763.5

NA

0

89.0

gene4

3217.4

0

4983.2

7493.8

210.2

gene5

98770.5

113498.0

498351.6

88134.1

345.6

gene6

0

NA

14.9

0

789.0

gene7

47648.8

0

32682.0

93873.2

123.4

Note

The formats accepted as features (genes, proteins are ENSEMBL, ENSEMBLTRAN, UNIGENE, REFSEQ, ACCNUM and UNIPROT and gene SYMBOL). Also note that the platform will not accept transcript IDs. You will need to convert them to Gene IDs. This will result in multiple gene entries that the platform will merge.

See also

If you are familiar with R, you can think of the counts file as a data.frame object. We provide an example samples file that can be accessed by installing playbase devtools::install_github("bigomics/playbase") and running playbase::COUNTS.