[ ]:

from sctoolbox.utilities import bgcolor
from sctoolbox import settings

Cell type annotation and marker list assembly

1 - Description

This Jupyter Notebook is designed for annotating cell types in clustered AnnData objects. It is divided into two main parts:

Marker List Assembly: This part is used when no existing marker lists are available. It enables users to assemble custom marker lists using the MarkerRepo.
Annotation: This section applies the created or provided marker lists to annotate cell types in AnnData objects.

The parameters are organized in three tables: 1. The first table contains basic parameters necessary for the annotation process. 2. The second table lists parameters specific to the Marker List Assembly section. 3. The third table lists parameters related to the Annotation section.

For a basic analysis, the parameters in the first table should be sufficient. However, for more advanced fine-tuning and detailed control of the analysis, the parameters in the second and third tables become critical.

1.1 - Parameter Overview

1.1.1 - Essential input data

Parameter	Description	Options
`clustered_adata`	Name of the clustered AnnData file for use.	String
`clustering_column`	`.obs` column used for cell type assignment.	`None` (select interactively) or String (e.g., `"leiden"`)
`celltype_column_name`	Name for the column with the final cell type annotation. If `None`, keeps all annotation columns.	`None` or String (e.g., `"pred_celltype"`)
`marker_lists`	Paths to marker lists. If `None`, assemble lists using MarkerRepo.	`None` or String or list of Strings (e.g., `"/path/my_markers"` or `["/heart_markers/markers", "/human/panglao"]`

1.1.2 - Marker List Assembly: `wrap.create_multiple_marker_lists`

Parameter	Description	Options
`organism`	Specifies the organism for marker list assembly.	`None` or String (e.g., `"human"`)
`column_specific_terms`	Search terms for marker list assembly, targeting specific columns.	`None` or Dictionary (e.g., `{"Source": "panglao.se"}`)
`cml_parameters`	Additional parameters for marker list assembly. One marker list is created per dictionary.	`None` or List of dictionaries (e.g., `[{"style":"two_column", "file_name":"two_column"}, {"style":"score", "file_name":"score"}]`
`repo_path`	Path to MarkerRepo.	String
`lists_path`	Path to a custom marker lists folder. If `None`, the lists folder of the `repo_path` will be used.	`None` or String (e.g., `"/path/my_markers"`)
`style`	The style of the marker lists. Options include “two_column” and “score”.	String
`file_name`	The name of the exported marker lists.	`None` (enter interactively) or String
`ensembl`	Use Ensembl IDs instead of gene symbols.	Boolean
`force_homology`	Create marker lists via homology even if lists for the organism exist.	Boolean
`show_lists`	Display the marker lists of the query.	Boolean
`adata`	Add marker list IDs to the `.uns` table of an AnnData object, if provided.	`None` or AnnData

If column_specific_terms and cml_parameters are None, you can assemble marker lists interactively.

The following columns are currently available for the MarkerRepo query: "ID", "List name", "Date", "Source", "Organism name", "Taxonomy ID", "Submitter name", "Email", "Tags", "Genotype", "Gender", "Life stage", "Tissue" and more.

1.1.3 - Annotation Parameters: `wrap.run_annotation`

Parameter	Description	Options/Type
`adata`	The AnnData object to annotate.	AnnData object
`marker_repo`	Use MarkerRepo for annotation.	Boolean
`SCSA`	Use SCSA for annotation.	Boolean
`marker_lists`	Paths to marker list files.	String or list of Strings (e.g., `"/path/my_markers"` or `["/heart_markers/markers", "/human/panglao"]`
`mr_obs`	`.obs` prefix for MarkerRepo annotation.	String (e.g., “mr”)
`scsa_obs`	`.obs` prefix for SCSA annotation.	String (e.g., “scsa”)
`rank_genes_column`	Column of `.uns` table with rank genes scores. If `None`, the ranking will be performed on the clustering_column.	`None` or String
`clustering_column`	`.obs` column used for cell type assignment.	`None` (select interactively) or String (e.g., `"leiden"`)
`reference_obs`	A reference annotation in `.obs` for comparison.	`None` or String
`keep_all`	If True, keeps all annotation columns.	Boolean
`verbose`	Enables printing of additional information.	Boolean
`show_ct_tables`	Shows additional MarkerRepo annotation tables with the first five top-ranked cell types per cluster.	Boolean
`show_plots`	Displays UMAP plots of the annotation, if available.	Boolean
`show_comparison`	Displays all annotations in one table.	Boolean
`ignore_overwrite`	Overwrites existing files without confirmation if True.	Boolean
`celltype_column_name`	Name for the column with the final cell type annotation. If `None`, keeps all annotation columns.	`None` or String (e.g., `"pred_celltype"`)

For more information about MarkerRepo, click here.

2- Setup

[ ]:

import sctoolbox.utilities as utils
import pandas as pd
pd.set_option('display.max_columns', None)  #no limit to the number of columns shown

[ ]:

try:
    import markerrepo.wrappers as wrap
    import markerrepo.marker_repo as mr
except ModuleNotFoundError:
    raise ModuleNotFoundError("Please install the latest MarkerRepo version.")

⬐ Fill in input data here ⬎

[ ]:

%bgcolor PowderBlue

# sctoolbox settings
settings.adata_input_dir = "../adatas/"
settings.adata_output_dir = "../adatas/"

clustered_adata = "anndata_4.h5ad"

3 - Loading adata

[ ]:

adata = utils.load_h5ad(clustered_adata)

[ ]:

with pd.option_context("display.max.rows", 5, "display.max.columns", None):
    display(adata)
    display(adata.obs)
    display(adata.var)

4 - Essential Input

⬐ Fill in input data here ⬎

[ ]:

%bgcolor PowderBlue

# Annotation settings
clustering_column = "leiden_0.1"
celltype_column_name = None
marker_lists = None

# Marker list assembly
if not marker_lists:
    organism = "human"
    column_specific_terms = {"Organism name":organism, "Source":"panglao"}

    cml_parameters = [{"file_name":"panglao_two_column", "style":"two_column"},
                      {"file_name":"panglao_score", "style":"score"},
                      {"file_name":"tissues_two_column", "style":"two_column",
                       "column_specific_terms":{"Tissue":["skin", "blood"]}}]

    repo_path = "./test_data/marker_repo/"
    lists_path = "./test_data/marker_repo/marker_lists/"

5 - Assemble marker lists

The marker list paths are stored in the marker_lists variable. They work as input for the actual cell type annotation of the next cell.

[ ]:

if not marker_lists:
    marker_lists = wrap.create_multiple_marker_lists(
        cml_parameters=cml_parameters,
        repo_path=repo_path,
        lists_path=lists_path,
        organism=organism,
        ensembl=mr.check_ensembl(adata),
        column_specific_terms=column_specific_terms,
        show_lists=True
    )

6 - Annotate adata

⬐ Fill in input data here ⬎

[ ]:

%bgcolor PowderBlue

marker_repo = True
SCSA = True
mr_obs = "MR"
scsa_obs = "SCSA"
rank_genes_column = None
reference_obs = None
show_comparison = True
ignore_overwrite = True
show_plots = True

[ ]:

wrap.run_annotation(
    adata,
    marker_repo=marker_repo,
    SCSA=SCSA,
    marker_lists=marker_lists,
    mr_obs=mr_obs,
    scsa_obs=scsa_obs,
    rank_genes_column=rank_genes_column,
    clustering_column=clustering_column,
    reference_obs=reference_obs,
    show_comparison=show_comparison,
    ignore_overwrite=ignore_overwrite,
    show_plots=show_plots,
    celltype_column_name=celltype_column_name
)

6.1 - Show annotated .obs table

[ ]:

display(adata.obs)

7 - Save adata

[ ]:

utils.save_h5ad(adata, "anndata_annotated.h5ad")

Cell type annotation and marker list assembly

1 - Description

1.1 - Parameter Overview

1.1.1 - Essential input data

1.1.2 - Marker List Assembly: wrap.create_multiple_marker_lists

1.1.3 - Annotation Parameters: wrap.run_annotation

2- Setup

⬐ Fill in input data here ⬎

3 - Loading adata

4 - Essential Input

⬐ Fill in input data here ⬎

5 - Assemble marker lists

6 - Annotate adata

⬐ Fill in input data here ⬎

6.1 - Show annotated .obs table

7 - Save adata

1.1.2 - Marker List Assembly: `wrap.create_multiple_marker_lists`

1.1.3 - Annotation Parameters: `wrap.run_annotation`