[ ]:
from sctoolbox.utilities import bgcolor
from sctoolbox import settings
Cell type annotation and marker list assembly
1 - Description
This Jupyter Notebook is designed for annotating cell types in clustered AnnData objects. It is divided into two main parts:
Marker List Assembly: This part is used when no existing marker lists are available. It enables users to assemble custom marker lists using the MarkerRepo.
Annotation: This section applies the created or provided marker lists to annotate cell types in AnnData objects.
The parameters are organized in three tables: 1. The first table contains basic parameters necessary for the annotation process. 2. The second table lists parameters specific to the Marker List Assembly section. 3. The third table lists parameters related to the Annotation section.
For a basic analysis, the parameters in the first table should be sufficient. However, for more advanced fine-tuning and detailed control of the analysis, the parameters in the second and third tables become critical.
1.1 - Parameter Overview
1.1.1 - Essential input data
Parameter |
Description |
Options |
---|---|---|
|
Name of the clustered AnnData file for use. |
String |
|
|
|
|
Name for the column with the final cell type annotation. If |
|
|
Paths to marker lists. If |
|
1.1.2 - Marker List Assembly: wrap.create_multiple_marker_lists
Parameter |
Description |
Options |
---|---|---|
|
Specifies the organism for marker list assembly. |
|
|
Search terms for marker list assembly, targeting specific columns. |
|
|
Additional parameters for marker list assembly. One marker list is created per dictionary. |
|
|
Path to MarkerRepo. |
String |
|
Path to a custom marker lists folder. If |
|
|
The style of the marker lists. Options include “two_column” and “score”. |
String |
|
The name of the exported marker lists. |
|
|
Use Ensembl IDs instead of gene symbols. |
Boolean |
|
Create marker lists via homology even if lists for the organism exist. |
Boolean |
|
Display the marker lists of the query. |
Boolean |
|
Add marker list IDs to the |
|
If column_specific_terms
and cml_parameters
are None
, you can assemble marker lists interactively.
The following columns are currently available for the MarkerRepo query: "ID"
, "List name"
, "Date"
, "Source"
, "Organism name"
, "Taxonomy ID"
, "Submitter name"
, "Email"
, "Tags"
, "Genotype"
, "Gender"
, "Life stage"
, "Tissue"
and more.
1.1.3 - Annotation Parameters: wrap.run_annotation
Parameter |
Description |
Options/Type |
---|---|---|
|
The AnnData object to annotate. |
AnnData object |
|
Use MarkerRepo for annotation. |
Boolean |
|
Use SCSA for annotation. |
Boolean |
|
Paths to marker list files. |
String or list of Strings (e.g., |
|
|
String (e.g., “mr”) |
|
|
String (e.g., “scsa”) |
|
Column of |
|
|
|
|
|
A reference annotation in |
|
|
If True, keeps all annotation columns. |
Boolean |
|
Enables printing of additional information. |
Boolean |
|
Shows additional MarkerRepo annotation tables with the first five top-ranked cell types per cluster. |
Boolean |
|
Displays UMAP plots of the annotation, if available. |
Boolean |
|
Displays all annotations in one table. |
Boolean |
|
Overwrites existing files without confirmation if True. |
Boolean |
|
Name for the column with the final cell type annotation. If |
|
For more information about MarkerRepo, click here.
2- Setup
[ ]:
import sctoolbox.utilities as utils
import pandas as pd
pd.set_option('display.max_columns', None) #no limit to the number of columns shown
[ ]:
try:
import markerrepo.wrappers as wrap
import markerrepo.marker_repo as mr
except ModuleNotFoundError:
raise ModuleNotFoundError("Please install the latest MarkerRepo version.")
⬐ Fill in input data here ⬎
⬐ Fill in input data here ⬎
[ ]:
%bgcolor PowderBlue
# sctoolbox settings
settings.adata_input_dir = "../adatas/"
settings.adata_output_dir = "../adatas/"
clustered_adata = "anndata_4.h5ad"
3 - Loading adata
[ ]:
adata = utils.load_h5ad(clustered_adata)
[ ]:
with pd.option_context("display.max.rows", 5, "display.max.columns", None):
display(adata)
display(adata.obs)
display(adata.var)
4 - Essential Input
⬐ Fill in input data here ⬎
⬐ Fill in input data here ⬎
[ ]:
%bgcolor PowderBlue
# Annotation settings
clustering_column = "leiden_0.1"
celltype_column_name = None
marker_lists = None
# Marker list assembly
if not marker_lists:
organism = "human"
column_specific_terms = {"Organism name":organism, "Source":"panglao"}
cml_parameters = [{"file_name":"panglao_two_column", "style":"two_column"},
{"file_name":"panglao_score", "style":"score"},
{"file_name":"tissues_two_column", "style":"two_column",
"column_specific_terms":{"Tissue":["skin", "blood"]}}]
repo_path = "./test_data/marker_repo/"
lists_path = "./test_data/marker_repo/marker_lists/"
5 - Assemble marker lists
The marker list paths are stored in the marker_lists variable. They work as input for the actual cell type annotation of the next cell.
[ ]:
if not marker_lists:
marker_lists = wrap.create_multiple_marker_lists(
cml_parameters=cml_parameters,
repo_path=repo_path,
lists_path=lists_path,
organism=organism,
ensembl=mr.check_ensembl(adata),
column_specific_terms=column_specific_terms,
show_lists=True
)
6 - Annotate adata
⬐ Fill in input data here ⬎
⬐ Fill in input data here ⬎
[ ]:
%bgcolor PowderBlue
marker_repo = True
SCSA = True
mr_obs = "MR"
scsa_obs = "SCSA"
rank_genes_column = None
reference_obs = None
show_comparison = True
ignore_overwrite = True
show_plots = True
[ ]:
wrap.run_annotation(
adata,
marker_repo=marker_repo,
SCSA=SCSA,
marker_lists=marker_lists,
mr_obs=mr_obs,
scsa_obs=scsa_obs,
rank_genes_column=rank_genes_column,
clustering_column=clustering_column,
reference_obs=reference_obs,
show_comparison=show_comparison,
ignore_overwrite=ignore_overwrite,
show_plots=show_plots,
celltype_column_name=celltype_column_name
)
6.1 - Show annotated .obs table
[ ]:
display(adata.obs)
7 - Save adata
[ ]:
utils.save_h5ad(adata, "anndata_annotated.h5ad")