[ ]:
import sctoolbox
from sctoolbox.utils import bgcolor
01 - Assembling or loading anndata object
1 - Description
This notebook is dedicated to load or create an anndata object. The anndata object is prepared for the following analysis notebooks and finally stored as an .h5ad
file. Based on the available data files there are multiple options to create the anndata object.
1. .h5ad
file
Choose this option if you have a .h5ad
file. The file could be provided by a preprocessing pipeline, a public dataset or a preceeding analysis.
2. star solo quant folder
This option is intended to assemble anndata object from the standard star solo output folder (quant/
). This is done by scaning through the folder structure and using the *_matrix.mtx
, *_barcodes.tsv
and *_genes.tsv
to create an anndata object per sample. The sample anndata objects are finally combined.
3. .mtx, barcode.tsv, genes.tsv
Choose this option if you have the expression matrix in .mtx
format, a file containing the barcodes (*_barcodes.tsv
) and a file containing the genes (*_genes.tsv
). Use this option for cases with the aforementioned three files available e.g. from a public dataset.
4. convert from R object
This option should be used if the data was processed using R. This can either be a .rds
or .robj
file.
2 - Setup
[ ]:
import sctoolbox.utilities as utils
import sctoolbox.assemblers as assembler
import sctoolbox.file_converter as converter
utils.settings_from_config("config.yaml", key="01")
3 - Read in data
⬐ Fill in input data here ⬎
⬐ Fill in input data here ⬎
[ ]:
%bgcolor PowderBlue
# For option 1: The path to an existing .h5ad file
path_h5ad = "test_data/adata_rna.h5ad"
# For option 2: Path to a star solo quant directory
path_quant = ""
# For option 3: Directory containing .mtx, barcodes.tsv and genes.tsv
path_mtx = ""
# For option 4: This is the path to the Seurat (.rds, .robj) file
path_rds = ""
[ ]:
if sum(map(lambda x: x != "", [path_h5ad, path_quant, path_mtx, path_rds])) != 1:
del path_h5ad, path_quant, path_mtx, path_rds
raise ValueError("Please set only one of the above variables. Adjust the cell above and re-run.")
3.1 - Option 1: Read from h5ad
[ ]:
if path_h5ad:
adata = utils.load_h5ad(path_h5ad)
3.2 - Option 2: Assemble from preprocessing pipeline ‘quant’ folder
⬐ Fill in input data here ⬎
⬐ Fill in input data here ⬎
[ ]:
%bgcolor PowderBlue
# Set up additional sample the information below.
# Follows the scheme:
# <sample name>:<type>:<value>
# E.g.:
# sample1:condition:room_air
# sample1:timepoint:early
# sample2:timepoint:late
the_10X_yml = []
[ ]:
if path_quant:
adata = assembler.from_quant(path_quant, the_10X_yml)
3.3 - Option 3: Create an anndata object from .mtx, barcodes.tsv and genes.tsv
[ ]:
if path_mtx:
adata = assembler.from_mtx(path_mtx)
3.4 - Option 4: Convert from Seurat to anndata object
[ ]:
# Converting from Seurat to anndata object
if path_rds:
adata = converter.convertToAdata(file=path_rds)
4 - Prepare anndata
Rename or remove .obs
and .var
columns as needed.
[ ]:
import pandas as pd
with pd.option_context('display.max_rows', 5,'display.max_columns', None):
display(adata.obs)
display(adata.var)
⬐ Fill in input data here ⬎
⬐ Fill in input data here ⬎
[ ]:
%bgcolor PowderBlue
# .obs column names that should be deleted
drop_obs = []
# .obs column names that should be changed. E.g. "old_name": "New Name"
rename_obs = {}
# .var column names that should be deleted
drop_var = []
# .var column names that should be changed. E.g. "old_name": "New Name"
rename_var = {}
[ ]:
# change obs
obs = adata.obs.copy()
obs.drop(columns=drop_obs, inplace=True)
obs.rename(columns=rename_obs, errors='raise', inplace=True)
# change var
var = adata.var.copy()
var.drop(columns=drop_var, inplace=True)
var.rename(columns=rename_var, errors='raise', inplace=True)
# apply changes to adata
adata.obs = obs
adata.var = var
5 - Saving the loaded anndata object
[ ]:
# Overview of loaded adata
display(adata)
[ ]:
# Saving the data
adata_output = "anndata_1.h5ad"
utils.save_h5ad(adata, adata_output)
[ ]:
sctoolbox.settings.close_logfile()