Statescope Tutorial

In this tutorial, we will walk through setting up and running the Statescope package in Python for bulk data analysis. We will cover installation, environment setup, data import, deconvolution, refinement, and cell state discovery.

1. Installation

Please follow the Installation Steps and create or activate your conda environment:

conda create -n statescope_env python=3.8
conda activate statescope_env

Make sure you have the necessary dependencies installed before proceeding.

2. Importing Dependencies

Below is an example of how you might structure your imports within a Python script or Jupyter notebook. Adjust paths and filenames as needed.

import Statescope
from Statescope import Initialize_Statescope
import pandas as pd
import pickle

3. Loading Your Bulk Data

For this tutorial, we’ll demonstrate using test data from a GitHub repository. Replace the URL below with your own bulk data source as needed.

# Example test dataset (subset of transcriptome data)
Bulk = pd.read_csv(
    'https://github.com/tgac-vumc/OncoBLADE/raw/refs/heads/main/data/Transcriptome_matrix_subset.txt',
    sep='\t',
    index_col='symbol'
)

For loading your own Bulk RNA-seq data check the input requirements for the dataset on the Input Requirements

4. Initializing Statescope

Choose an appropriate tumor or tissue type. Currently supported options include:

NSCLC
PDAC
PBMC

More details about the pre-processed scRNA references present in Processed Signature Datasets

Statescope_model = Initialize_Statescope(Bulk, TumorType='NSCLC')

For providing your own scRNA-seq dataset check Input Requirements for the correct format.

import scanpy as sc 
file_path = 'scRNA.h5ad' ###scRNA data should be in h5ad format
Signature = sc.read_h5ad(file_path)
Statescope_model = Initialize_Statescope(Bulk, Signature = Signature, celltype_key= 'celltype_key', Ncores = 40) ####Specify celltype_key in the adata.obs

5. Deconvolution

Deconvolution is used to estimate cell-type-specific expression from bulk data.

Statescope_model.Deconvolution()

After deconvolution, you can extract the fractions of each gene across different cell types using:

Fractions = Statescope_model.Fractions  # shape: [N_genes x N_cell_types]
print(Fractions.head())

Refinement is performed to improve the estimation of cell-type-specific gene expression profiles.

Statescope_model.Refinement()

# You can run the refinement multiple times if desired
Statescope_model.Refinement()

The refined gene expression profiles for each cell type can be accessed:

from Statescope import Extract_GEX
##Gene expression of the cell type you want to extract 
Extract_GEX(Statescope_model, 'Celltype')

7. Cell State Discovery

To discover potential sub-states or subpopulations within each cell type, run:

Statescope_model.StateDiscovery()

After running state discovery, you can extract information about the discovered sub-states (loadings, etc.) from the model. The specific attributes and methods will depend on how Statescope organizes its results. For example:

from Statescope import Extract_StateLoadings
Extract_StateLoadings(Statescope_model)

(Make sure to check the Statescope documentation for more details on these attributes.)

8. Visualization

8.1 Visualizing Fractions

To quickly visualize the fraction matrix, you can use a heatmap function provided by Statescope:

from Statescope import Heatmap_Fractions

Heatmap_Fractions(Statescope_model)

This generates a heatmap of the cell-type fractions or scores across genes (or samples, depending on how your data is organized).

8.2 Visualizing Purified Gene Expression Profiles

from Statescope import Heatmap_GEX

Heatmap_GEX(Statescope_model, 'Cell Type')

8.3 Visualizing Top Genes Per Cell State

You can choose how many genes per state to be visualised in the Bar plot using the top_genes argument

from Statescope import BarPlot_StateLoadings
# Example Usage
BarPlot_StateLoadings(Statescope_model, top_genes=1)

9. Summary

By following these steps, you will:

Install and set up your environment.
Load your bulk data (or use the provided test dataset).
Initialize Statescope with the appropriate TumorType.
Perform Deconvolution to estimate cell-type fractions.
Refine those estimates for improved accuracy.
Discover sub-states (cell state discovery).
Visualize your results through heatmaps and other methods.

Feel free to adjust the code to fit your data structure, directory organization, or specific analysis needs. For more advanced usage, please consult the official Statescope documentation or check additional examples in the repository.

1. Installation​

2. Importing Dependencies​

3. Loading Your Bulk Data​

4. Initializing Statescope​

5. Deconvolution​

6. Refinement​

7. Cell State Discovery​

8. Visualization​

8.1 Visualizing Fractions​

8.2 Visualizing Purified Gene Expression Profiles​

8.3 Visualizing Top Genes Per Cell State​

9. Summary​