Input Arguments
This page outlines the required inputs and optional parameters for running Statescope.
Required Datasetsā
Signature Matricesā
The signature matrix defines the gene expression profiles of different cell types.
Available Options for Signature Matricesā
There are two ways to specify the signature matrix:
Option 1: Using Pre-processed Signaturesā
- Statescope provides pre-processed signatures for various tumor types.
- To use these signatures, specify the TumorType and the number of cell types (
Ncelltypes). - Available options for
TumorTypeandNcelltypescan be found in the Processed Signatures page.
Example using pre-processed signatures:
Statescope_model = Initialize_Statescope(Bulk, TumorType='PBMC', Ncelltypes=7, Ncores=40)
PBMCis the pre-processed signature used.Ncelltypes=7specifies the number of cell types to use.Ncores=40defines the number of CPU cores allocated.
Option 2: Using Your Own Single-Cell RNA Dataā
- Users can also provide their own custom single-cell data in
.h5adformat. - The cell type annotations should be present in the key specified in
celltype_key.
Example using a custom signature matrix:
Statescope_model = Initialize_Statescope(
Bulk,
Signature=Signature,
celltype_key='leiden',
Ncores=40
)
Bulkis the bulk RNA-seq dataset (transposed).Signatureis the custom signature matrix derived from single-cell data.celltype_key='leiden'specifies the annotation key in the.h5adfile.
ā ļø Note:
- Single-cell data should be preprocessed (filtering, QC, normalization).
- Statescope handles internal normalization and preprocessing automatically.
- Ensure the cell type annotations exist under the key
celltype_keyin.obs.
Bulk Gene Expression Dataā
- Y: An Ngene by Nsample matrix containing bulk gene expression data.
- Should be provided in linear scale (without log-transformation).
- Format: Ideally, a pandas DataFrame.
Example format in Python:
import pandas as pd
Bulk = pd.read_csv("bulk_expression.csv", index_col=0)
Expected Cell Fractions (Optional)ā
- Expectation: An Nsample by Ncelltype matrix.
- Specifies prior expectations of cell type proportions (used by OncoBLADE).
- Format: Ideally, a pandas DataFrame.
Example format in Python:
expected_fractions = pd.read_csv("expected_cell_fractions.csv", index_col=0)
Important Notesā
- Bulk RNA-seq data should be in linear scale (not log-transformed).
- Signature matrices should be in log-scale.
- Single-cell
.h5adfiles should contain filtered, QCād, and annotated cell types. - pandas DataFrames** are recommended for structured inputs.
Signature Preprocessing (h5ad)ā
Your input .h5ad should be CP10K-normalized and log1p-transformed before initializing Statescope.
Example:
import scanpy as sc
adata = sc.read_h5ad("signature.h5ad")
sc.pp.normalize_total(adata, target_sum=1e4) # CP10K
sc.pp.log1p(adata)
Saving Your Statescope Modelā
It is highly recommended to initialize Statescope on CPU, then save the model so it can be reopened later on CPU or GPU.
Statescope_model = Initialize_Statescope(
Bulk,
Signature=Signature,
celltype_key='leiden',
Ncores=40,
)
# Save the model for reuse (set to_cpu=True for CPU portability)
Statescope_model.save("Statescope_model.pkl", to_cpu=True)