Input Arguments
This page outlines the required inputs and optional parameters for running Statescope.
Required Datasetsā
Signature Matricesā
The signature matrix defines the gene expression profiles of different cell types.
Available Options for Signature Matricesā
There are two ways to specify the signature matrix:
Option 1: Using Pre-processed Signaturesā
- Statescope provides pre-processed signatures for various tumor types.
- To use these signatures, specify the TumorType and the number of cell types (
Ncelltypes). - Available options for
TumorTypeandNcelltypescan be found in the Processed Signatures page.
Example using pre-processed signatures:
Statescope_model = Initialize_Statescope(Bulk, TumorType='PBMC', Ncelltypes=7, Ncores=40)
PBMCis the pre-processed signature used.Ncelltypes=7specifies the number of cell types to use.Ncores=40defines the number of CPU cores allocated.
Option 2: Using Your Own Single-Cell RNA Dataā
- Users can also provide their own custom single-cell data in
.h5adformat. - The cell type annotations should be present in the key specified in
celltype_key.
Example using a custom signature matrix:
Statescope_model = Initialize_Statescope(
Bulk,
Signature=Signature,
celltype_key='leiden',
Ncores=40
)
Bulkis the bulk RNA-seq dataset (transposed).Signatureis the custom signature matrix derived from single-cell data.celltype_key='leiden'specifies the annotation key in the.h5adfile.
ā ļø Note:
- Single-cell data should be preprocessed (filtering, QC, normalization).
- Statescope handles internal normalization and preprocessing automatically.
- Ensure the cell type annotations exist under the key
celltype_keyin.obs.
Bulk Gene Expression Dataā
- Y: An Ngene by Nsample matrix containing bulk gene expression data.
- Should be provided in linear scale (without log-transformation).
- Format: Ideally, a pandas DataFrame.
Example format in Python:
import pandas as pd
Bulk = pd.read_csv("bulk_expression.csv", index_col=0)
Expected Cell Fractions (Optional)ā
- Expectation: An Nsample by Ncelltype matrix.
- Specifies prior expectations of cell type proportions (used by OncoBLADE).
- Format: Ideally, a pandas DataFrame.
Example format in Python:
expected_fractions = pd.read_csv("expected_cell_fractions.csv", index_col=0)
Important Notesā
- Bulk RNA-seq data should be in linear scale (not log-transformed).
- Signature matrices should be in log-scale.
- Single-cell
.h5adfiles should contain filtered, QCād, and annotated cell types. - pandas DataFrames** are recommended for structured inputs.