AUCell & SCENIC Workflow

This section demonstrates how to perform gene set analysis using R, binarize clusters, and compute Jensen-Shannon Divergence (JSD) for comparing modules and clusters in gene expression data.

hub_df was generated from following the tutorial here: hdWGCNA. Scroll down to section called “Getting the module assignment table”.

You need to download AUCell, dplyr, SCENIC, and BiocParallel.

Source Example workflow : Preparing the input for FOX (based on: https://rdrr.io/github/aertslab/AUCell/man/AUCell_exploreThresholds.html)

You must install these R packages:

library(AUCell)
library(SCENIC)
library(dplyr)

Prepare expression matrix from your Seurat object:

exprMatrix <- Seurat_obj[["RNA"]]$counts

Previous Steps in the Workflow

Step 1: Build cell rankings

cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE)

Step 2: Calculate AUC

Use hdWGCNA output or define manually.

Calculate the hub_df from the vignette and plug in your module eigengenes here!

#  assuming you calculated the gene sets and have a copy of them... see GetHubGenes(seurat_obj, n_hubs = 10) from hdWGCNA
#  shape of hub_df

#   gene_name  module       kME
# 1      Gene1 black    0.3711414
# 2      Gene2 blue     0.3694937
# 3      Gene3 blue     0.3318094
# 4      Gene4 black    0.3304103
# 5      Gene5 black    0.3312313

# df_grouped = hub_df %>% group_by(module) %>% summarize(genes_in_module = list(gene_name), .groups = "drop")

 # Create gene sets for each module
 # geneSets <- lapply(1:nrow(df_grouped), function(i) {
 #    GeneSet(df_grouped$genes_in_module[[i]], setName = as.character(df_grouped$module[i]))
 # })

## this is how you upload them to AUCell... you can plug in each your gene sets here
geneSets <- list(
  black = c("Gene1", "Gene5", "Gene4"),
  blue  = c("Gene3", "Gene2")
)

cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings)

Step 3: Assign cells

set.seed(123)
thresholds <- AUCell_exploreThresholds(cells_AUC, plotHist=FALSE) #no cells

# Assign cells based on thresholds
cells_assignment <- AUCell_exploreThresholds(
  cells_AUC,
  plotHist=FALSE,
  assignCells=TRUE
)

# Extract threshold values for each regulon
Thresholds_forAUCell <- getThresholdSelected(cells_assignment)

Export Thresholds to .tsv

regulon_df <- data.frame(
  regulon = names(Thresholds_forAUCell),
  threshold = as.numeric(Thresholds_forAUCell),
  stringsAsFactors = FALSE
)

# Must include "3.5_" in the filename for SCENIC compatibility
write.table(
  regulon_df,
  file = "3.5_regulon_scores_thresholds.tsv",
  sep = "\t",
  row.names = FALSE,
  quote = FALSE
)

Get AUC and Generate RSS

cells_test_RAS <- getAUC(cells_AUC)

# Take labels from Seurat object
cellInfo <- data.frame(seuratCluster = Idents(Seurat_obj))

# Optional: Remove low-confidence regulons
cells_AUC <- cells_AUC[!grepl("extended", rownames(cells_AUC)), ]

# Calculate RSS
rss <- calcRSS(
  AUC = getAUC(cells_AUC),
  cellAnnotation = cellInfo[colnames(cells_AUC), "seuratCluster"]
)

write.csv(rss, file = "rss_values_.csv")

# Merge RAS with metadata
Seurat_obj@meta.data <- cbind(Seurat_obj@meta.data, t(cells_test_RAS))
write.csv(Seurat_obj@meta.data, file = "RAS_values_dataset.csv")

Usage Example

To run FOX, you’ll need to prepare your data (such as RSS matrices and metadata) and pass it to the class. Here’s an example of how to initialize and use FOX:

See examples: Test_group_supplemental.html

from FOXREG import ComparisonTree
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

 # Read in the data
 data = pd.read_csv("rss_values_.csv")  # RSS values
 df_RAS = pd.read_csv("RAS_values_dataset.csv")  # AUC metadata

 # Define labels for your comparison
 other_clusters_to_compare = data.columns[1:].tolist()

 # Initialize the ComparisonTree with your data
 comparison = ComparisonTree(
     "<baseline cluster>", df_RAS, "newLabels", data, other_clusters_to_compare, "Unnamed: 0",
     "3.5_regulon_scores_thresholds.tsv"
 )