AUCell & SCENIC Workflow
This section demonstrates how to perform gene set analysis using R, binarize clusters, and compute Jensen-Shannon Divergence (JSD) for comparing modules and clusters in gene expression data.
hub_df was generated from following the tutorial here: hdWGCNA. Scroll down to section called “Getting the module assignment table”.
You need to download AUCell, dplyr, SCENIC, and BiocParallel.
Source Example workflow : Preparing the input for FOX (based on: https://rdrr.io/github/aertslab/AUCell/man/AUCell_exploreThresholds.html)
You must install these R packages:
library(AUCell)
library(SCENIC)
library(dplyr)
Prepare expression matrix from your Seurat object:
exprMatrix <- Seurat_obj[["RNA"]]$counts
Previous Steps in the Workflow
Step 1: Build cell rankings
cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE)
Step 2: Calculate AUC
Use hdWGCNA output or define manually.
Calculate the hub_df from the vignette and plug in your module eigengenes here!
# assuming you calculated the gene sets and have a copy of them... see GetHubGenes(seurat_obj, n_hubs = 10) from hdWGCNA
# shape of hub_df
# gene_name module kME
# 1 Gene1 black 0.3711414
# 2 Gene2 blue 0.3694937
# 3 Gene3 blue 0.3318094
# 4 Gene4 black 0.3304103
# 5 Gene5 black 0.3312313
# df_grouped = hub_df %>% group_by(module) %>% summarize(genes_in_module = list(gene_name), .groups = "drop")
# Create gene sets for each module
# geneSets <- lapply(1:nrow(df_grouped), function(i) {
# GeneSet(df_grouped$genes_in_module[[i]], setName = as.character(df_grouped$module[i]))
# })
## this is how you upload them to AUCell... you can plug in each your gene sets here
geneSets <- list(
black = c("Gene1", "Gene5", "Gene4"),
blue = c("Gene3", "Gene2")
)
cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings)
Step 3: Assign cells
set.seed(123)
thresholds <- AUCell_exploreThresholds(cells_AUC, plotHist=FALSE) #no cells
# Assign cells based on thresholds
cells_assignment <- AUCell_exploreThresholds(
cells_AUC,
plotHist=FALSE,
assignCells=TRUE
)
# Extract threshold values for each regulon
Thresholds_forAUCell <- getThresholdSelected(cells_assignment)
Export Thresholds to .tsv
regulon_df <- data.frame(
regulon = names(Thresholds_forAUCell),
threshold = as.numeric(Thresholds_forAUCell),
stringsAsFactors = FALSE
)
# Must include "3.5_" in the filename for SCENIC compatibility
write.table(
regulon_df,
file = "3.5_regulon_scores_thresholds.tsv",
sep = "\t",
row.names = FALSE,
quote = FALSE
)
Get AUC and Generate RSS
cells_test_RAS <- getAUC(cells_AUC)
# Take labels from Seurat object
cellInfo <- data.frame(seuratCluster = Idents(Seurat_obj))
# Optional: Remove low-confidence regulons
cells_AUC <- cells_AUC[!grepl("extended", rownames(cells_AUC)), ]
# Calculate RSS
rss <- calcRSS(
AUC = getAUC(cells_AUC),
cellAnnotation = cellInfo[colnames(cells_AUC), "seuratCluster"]
)
write.csv(rss, file = "rss_values_.csv")
# Merge RAS with metadata
Seurat_obj@meta.data <- cbind(Seurat_obj@meta.data, t(cells_test_RAS))
write.csv(Seurat_obj@meta.data, file = "RAS_values_dataset.csv")
Usage Example
To run FOX, you’ll need to prepare your data (such as RSS matrices and metadata) and pass it to the class. Here’s an example of how to initialize and use FOX:
See examples: Test_group_supplemental.html
from FOXREG import ComparisonTree
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
# Read in the data
data = pd.read_csv("rss_values_.csv") # RSS values
df_RAS = pd.read_csv("RAS_values_dataset.csv") # AUC metadata
# Define labels for your comparison
other_clusters_to_compare = data.columns[1:].tolist()
# Initialize the ComparisonTree with your data
comparison = ComparisonTree(
"<baseline cluster>", df_RAS, "newLabels", data, other_clusters_to_compare, "Unnamed: 0",
"3.5_regulon_scores_thresholds.tsv"
)