[ ]:
import sctoolbox
from sctoolbox.utils import bgcolor

01 - Assembling or loading anndata object


1 - Description

This notebook is dedicated to load or create an anndata object. The anndata object is prepared for the following analysis notebooks and finally stored as an .h5ad file. Based on the available data files there are multiple options to create the anndata object.

1. .h5ad file

Choose this option if you have a .h5ad file. The file could be provided by a preprocessing pipeline, a public dataset or a preceeding analysis.

2. star solo quant folder

This option is intended to assemble anndata object from the standard star solo output folder (quant/). This is done by scaning through the folder structure and using the *_matrix.mtx, *_barcodes.tsv and *_genes.tsv to create an anndata object per sample. The sample anndata objects are finally combined.

3. .mtx, barcode.tsv, genes.tsv

Choose this option if you have the expression matrix in .mtx format, a file containing the barcodes (*_barcodes.tsv) and a file containing the genes (*_genes.tsv). Use this option for cases with the aforementioned three files available e.g. from a public dataset.

4. convert from R object

This option should be used if the data was processed using R. This can either be a .rds or .robj file.


2 - Setup

[ ]:
import sctoolbox.utilities as utils
import sctoolbox.assemblers as assembler
import sctoolbox.file_converter as converter

utils.settings_from_config("config.yaml", key="01")

3 - Read in data


⬐ Fill in input data here ⬎

[ ]:
%bgcolor PowderBlue

# For option 1: The path to an existing .h5ad file
path_h5ad = "test_data/adata_rna.h5ad"

# For option 2: Path to a star solo quant directory
path_quant = ""

# For option 3: Directory containing .mtx, barcodes.tsv and genes.tsv
path_mtx = ""

# For option 4: This is the path to the Seurat (.rds, .robj) file
path_rds = ""

[ ]:
if sum(map(lambda x: x != "", [path_h5ad, path_quant, path_mtx, path_rds])) != 1:
    del path_h5ad, path_quant, path_mtx, path_rds
    raise ValueError("Please set only one of the above variables. Adjust the cell above and re-run.")

3.1 - Option 1: Read from h5ad

[ ]:
if path_h5ad:
    adata = utils.load_h5ad(path_h5ad)

3.2 - Option 2: Assemble from preprocessing pipeline ‘quant’ folder

⬐ Fill in input data here ⬎

[ ]:
%bgcolor PowderBlue

# Set up additional sample the information below.
# Follows the scheme:
# <sample name>:<type>:<value>
# E.g.:
# sample1:condition:room_air
# sample1:timepoint:early
# sample2:timepoint:late
the_10X_yml = []

[ ]:
if path_quant:
    adata = assembler.from_quant(path_quant, the_10X_yml)

3.3 - Option 3: Create an anndata object from .mtx, barcodes.tsv and genes.tsv

[ ]:
if path_mtx:
    adata = assembler.from_mtx(path_mtx)

3.4 - Option 4: Convert from Seurat to anndata object

[ ]:
# Converting from Seurat to anndata object
if path_rds:
    adata = converter.convertToAdata(file=path_rds)

4 - Prepare anndata

Rename or remove .obs and .var columns as needed.

[ ]:
import pandas as pd

with pd.option_context('display.max_rows', 5,'display.max_columns', None):
    display(adata.obs)
    display(adata.var)

⬐ Fill in input data here ⬎

[ ]:
%bgcolor PowderBlue

# .obs column names that should be deleted
drop_obs = []

# .obs column names that should be changed. E.g. "old_name": "New Name"
rename_obs = {}

# .var column names that should be deleted
drop_var = []

# .var column names that should be changed. E.g. "old_name": "New Name"
rename_var = {}

[ ]:
# change obs
obs = adata.obs.copy()

obs.drop(columns=drop_obs, inplace=True)
obs.rename(columns=rename_obs, errors='raise', inplace=True)

# change var
var = adata.var.copy()

var.drop(columns=drop_var, inplace=True)
var.rename(columns=rename_var, errors='raise', inplace=True)

# apply changes to adata
adata.obs = obs
adata.var = var

5 - Saving the loaded anndata object

[ ]:
# Overview of loaded adata
display(adata)
[ ]:
# Saving the data
adata_output = "anndata_1.h5ad"
utils.save_h5ad(adata, adata_output)
[ ]:
sctoolbox.settings.close_logfile()