First set up your Zorn/Bascet work directory as before. If you wish to run these steps on a SLURM cluster, see separate vignette and adapt accordingly.
library(Zorn)
bascet_runner.default <- LocalRunner(direct = TRUE, show_script=TRUE)
bascetRoot <- "/home/yours/an_empty_workdirectory"
Next, produce a count sketch for each cell. There are further options if you wish to alter the number of dimensions. You can reduce the number of dimensions later in R, but you will need more memory for this procedure.
#Internally wraps mapcell to compute count sketches for each cell
BascetComputeCountSketch(
bascetRoot
)
This results in a Bascet-ZIP with all the count sketches. Turn this into a count matrix file using the following command:
#Gather count sketches into a single matrix file
BascetGatherCountSketch(
bascetRoot
)
You can now load all of this data into R as a Seurat object:
adata <- BascetLoadCountSketchMatrix(
bascetRoot
)
The data is rather large; we recommend enabling multithreading for Seurat:
Note that the data was loaded directly as a reduction, rather than as regular counts. This is because the counts have little meaning of their own, and it makes little sense to perform PCA, SVD or similar on the data. Rather, we recommend using all the dimensions. The following produces a UMAP, setting the dimension to all available dimensions in the reduction:
reduction_name <- "kmersketch"
adata <- RunUMAP(
adata,
dims = 1:ncol(adata@reductions[[reduction_name]]@cell.embeddings),
reduction = reduction_name,
metric = "cosine"
)
The result can be visualized:
DimPlot(pbmc)
We recommend that you integrate this object with the output of KRAKEN2 to get some clue about what the clusters mean.