First set up your Zorn/Bascet working directory as before. If you wish to run these steps on a SLURM cluster, see separate vignette and adapt accordingly.
library(Zorn)
bascet_runner.default <- LocalRunner(direct = TRUE, show_script=TRUE)
bascetRoot <- "/home/yours/an_empty_workdirectory"
KRAKEN2 wants the data in FASTQ format. Here we assume that you use a single-cell WGS chemistry that produces paired reads, and direct Zorn to produce R1/R2 FASTQ files (only R1 specified). We name the output Bascets “asfq”, taking as input the typical sharded reads:
### Get reads in fastq format
BascetMapTransform(
bascetRoot,
"filtered", #default; can omit
"asfq", ###name parameter
out_format="R1.fq.gz"
)
To run KRAKEN2, you need a database. You can get them here: https://benlangmead.github.io/aws-indexes/k2
If you want a small one then consider standard-8. Unzip it in a directory. You can then run KRAKEN2 like this:
### Run Kraken on each cell
BascetRunKraken(
bascetRoot,
useKrakenDB="/your_disk/kraken/standard-8",
numLocalThreads=20
)
### Produce a count matrix of taxonomy features
BascetMakeKrakenCountMatrix(
bascetRoot,
numLocalThreads=20
)
Note that there are two steps here. First KRAKEN2 classifies each read to a taxonomy. In the second step, we count the taxonomic reads for each cell. This ends up being a rather small matrix that you can process using Seurat.
============ TODO the rest ==============