Alignment-based workflow • Zorn

First set up your Zorn/Bascet workdirectory as before. If you wish to run these steps on a SLURM cluster, see separate vignette and adapt accordingly.

library(Zorn)
bascet_runner.default <- LocalRunner(direct = TRUE, show_script=TRUE)
bascetRoot <- "/home/yours/an_empty_workdirectory"

We will use BWA for alignment, which wants the data in FASTQ format. Here we assume that you use a single-cell WGS chemistry that produces paired reads, and direct Zorn to produce R1/R2 FASTQ files (only R1 specified). We name the output Bascets “asfq”, taking as input the typical sharded reads:

(SLURM-compatible step)

### Get reads in fastq format
BascetMapTransform(
  bascetRoot,
  inputName="filtered",   #default; can omit
  outputName="asfq",  ###name parameter
  out_format="R1.fq.gz"
)

Before alignment, you need to prepare your reference by indexing it. You can do this using “bwa index” in the terminal; but you can also do it via Zorn. This assumes that you have a FASTA-file named all.fa (can also be gzipped):

(SLURM-compatible step)

#Build a reference
BascetIndexGenomeBWA(
  bascetRoot,
  "/your_reference/all.fa"
)

You can now proceed with alignment using BWA. By default, this command outputs both unsorted and sorted BAM-files.

The sorted BAM-files are required for counting of reads in genomic regions, i.e., WGS species counting and RNA-seq
The unsorted BAM-files are required for filtering of host reads

(SLURM-compatible step)

### Perform alignment
BascetAlignToReference(
  bascetRoot,
  useReference="/your_reference/all.fa",
  numLocalThreads=10
)

TODO: Filtering

Host filtering workflow

BascetFilterAlignment(
  bascetRoot,
  outputName="filter_paired",
  keep_mapped=TRUE
)

Counting workflow

If you want to analyze where reads mapped, either for RNA-seq counting for genes, or more generally, what the count is within a genomic region, you can use Signac for great flexibility in this task. Signac is designed to analyze single-cell ATAC-seq data, where the reads are stored in a specialized BED-file. You can use the command below to generate this file.

Note one caveat: depending on chemistry, this workflow may not handle UMIs, and just report raw read counts. However, for quick and dirty operations this is typically enough.

(SLURM-compatible step)

### Generate fragments BED file suited for quantifying reads/chromosome using Signac later 
BascetBam2Fragments(
  bascetRoot
)

If you just want to get the number of reads per chromosome (i.e. to compare a metagenomic sample to a number of defined strains), then we provide a simpler counter. The output is a small matrix that you can load with Seurat:

(SLURM-compatible step)

### Count reads per chromosome
BascetCountChrom(
  bascetRoot
)