Deconvolution of NSCLC Data
This tutorial explains how to use BLADE for NSCLC data deconvolution. It covers repository setup, signature creation, deconvolution execution on an HPC server, and visualizing the results.
Prerequisites
- Access to an HPC Cluster: Ensure you have SLURM or a similar job scheduler installed on your HPC.
- Git and Conda Installed: Git for pulling the repository and Conda for managing environments.
- Bulk RNA-seq Data: A transcriptome matrix in
.csv
,.tsv
, or.txt
format.
Step 1: Pull the Repository
Clone the BLADE repository to your HPC server:
git clone https://github.com/
cd BLADE
Step 2: Setup and Create Conda Environments
Ensure Conda is installed on your system. Create the required environments:
conda env create -f envs/oncoBLADE.yaml
conda env create -f envs/env_preprocess.yaml
Activate the oncoBLADE
environment when running deconvolution:
conda activate oncoBLADE
Step 3: Use Processed Signatures or Create New Signatures
Option 1: Use Processed Signatures
Pre-processed signatures are provided in the repository. Use the file located at:
SignaturePipeline/output/Corrected_Signature_matrix.pickle
Ensure your bulk RNA-seq data matches the format of the signature file.
Option 2: Create New Signatures
To create new signatures from single-cell RNA-seq data, follow these steps:
-
Prepare the raw data files:
- Counts matrix in
.rds
or.mtx
format. - Metadata in
.csv
format.
- Counts matrix in
-
Edit the
config.yaml
file to specify the paths to your data:data:
counts_matrix: "data/raw/RNA_rawcounts_matrix.rds"
metadata: "data/raw/metadata.csv"
cell_type_column: "cell_type_major" -
Run the Snakemake pipeline to preprocess the data and create signatures:
snakemake -s Snakefile --cores 4 --use-conda
This will generate a corrected signature matrix at:
output/Corrected_Signature_matrix.pickle
Step 4: Run Deconvolution on an HPC Server
SLURM Job Script
Create a SLURM job script (run_deconvolution.sh
) to execute the workflow:
#!/bin/bash
#SBATCH --job-name=pbmc_deconvolution
#SBATCH --output=logs/deconvolution_%j.out
#SBATCH --error=logs/deconvolution_%j.err
#SBATCH --time=24:00:00
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --partition=standard
# Load required modules
module load python/3.8 # Adjust based on your cluster configuration
# Activate Conda environment
source activate oncoBLADE
# Run deconvolution
snakemake -s Snakefile \
--cluster "sbatch --job-name={rule} --cpus-per-task={threads} --mem={resources.mem_mb} --output=logs/%x-%j.out --error=logs/%x-%j.err" \
--default-resources mem_mb=32000 threads=1 \
--use-conda \
--jobs 10 \
-p
Submit the job to SLURM:
sbatch run_deconvolution.sh
Step 5: Visualize Results
The workflow generates several output files, including visualizations. To analyze and visualize the results:
Deconvolution Outputs:
output/torch/oncoBLADE_output.pickle
: Contains deconvolution results.graphs/estimated_fractions.png
: Visualization of estimated cell fractions.
Python Script for Custom Visualization
Use the following script to load and plot the deconvolution results:
import pickle
import pandas as pd
import matplotlib.pyplot as plt
# Load deconvolution results
with open('output/torch/oncoBLADE_output.pickle', 'rb') as f:
results = pickle.load(f)
# Convert to DataFrame
fractions = pd.DataFrame(results['cell_fractions'])
fractions.index = results['samples']
fractions.columns = results['cell_types']
# Plot estimated cell fractions
fractions.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.title('Estimated Cell Fractions')
plt.ylabel('Fraction')
plt.xlabel('Samples')
plt.legend(loc='upper right', bbox_to_anchor=(1.25, 1))
plt.tight_layout()
plt.savefig('graphs/custom_estimated_fractions.png')
plt.show()