Skip to contents

FASTQC

(move to another section?)

If you want to run QC on all cells as a whole, to get the average picture, simply run FASTQC on reads after transformation to FASTQ:

### Get reads in fastq format
BascetMapTransform(
  bascetRoot,
  inputName="filtered",   
  outputName="asfq",      
  out_format="R1.fq.gz"
)

You can also run FASTQC on each individual cell, in which case you do not need to convert to FASTQ as above. This takes a fair bit of time, but can help tell if, e.g., a cluster of cells is caused by technical issues such as adapter content. You first run FASTQC with mapcell:

(SLURM-compatible step)

BascetMapCellFASTQ(
  bascetRoot,
  inputName = "filtered"  #or other source of reads
)
BascetMapCellFASTQ(
  bascetRoot,
  inputName = "filtered"  #or other source of reads
)

If you have an outlier cell in your dataset, you can investigate its FASTQ HTML report in the follow manner (opening in the RShiny plot pane, or separate browser):

ShowFASTQCforCell(
    bascetFile, 
    cellID="xyz", #name of your cell 
    readnum="1", #for R1
)

You can also compare cells by aggregating the data. Note that FASTQC creates rather complex statistics that need further extraction for simple plotting

aggr_fastqc <- BascetAggregateFASTQC(
  bascetRoot
)

One relevant statistic is the adapter content across the read:

PlotFASTQCadapterContent <- function(
    aggr_fastqc,
    readnum="1" #for R1
)

You can also retrieve a table of pass/fail statistics:

fastqc_passfail <- GetFASTQCpassfailStats(
    aggr_fastqc,
    readnum="1" #for R1
)

Because there are so many things you can do with this statistics, we provide a general interface to each table that FASTQC generates:

mystats <- GetFASTQCassembledDF(
    aggr_fastqc, 
    section="see below", 
    readnum="1"
)

Possible values of section are:

  • “Basic Statistics”
  • “Per base sequence quality”
  • “Per sequence quality scores”
  • “Per base sequence content”
  • “Per sequence GC content”
  • “Per base N content”
  • “Sequence Length Distribution”
  • “Sequence Duplication Levels”
  • “Overrepresented sequences”
  • “Adapter Content”