Stereo-seq Mouse Embryonic Development Timecourse

Key features:

Apply Gaussian smoothing to address data noise and sparsity across datasets
Construct a gene co-expression network across samples using smoothed datasets
Group genes into multi-sample spatial modules with hard and soft thresholding on the co-expression network
Identify stable vs. dynamic gene expression distributions between dataset pairs using second-order correlations

In [1]:

Copied!





import anndata as ad
import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
import seaborn as sns

import smoothie
smoothie.suppress_warnings() # Suppress a few warnings for cleaner output
import anndata as ad
import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
import seaborn as sns

import smoothie
smoothie.suppress_warnings() # Suppress a few warnings for cleaner output

Load Multiple Datasets¶

Smoothie requires data input in the AnnData format. A guide for constructing the AnnData data structure for both imaging-based and sequencing-based spatial transcriptomics outputs can be found in /guides/convert_to_anndata.md

In [ ]:

Copied!





# Bash code to generate AnnData file from STOmics bin1 output: https://db.cngb.org/stomics/mosta/download/
# python -m smoothie.convert_to_anndata /path/to/E16.5_E1S1_GEM_bin1.tsv.gz \
#     --x-col x \
#     --y-col y \
#     --gene-col geneID \
#     --count-col MIDCounts \
#     --format tsv.gz \
#     --chunksize 1000000 \
#     --output mouse_E16.5_E1S1_raw_dnb.h5ad
# Bash code to generate AnnData file from STOmics bin1 output: https://db.cngb.org/stomics/mosta/download/
# python -m smoothie.convert_to_anndata /path/to/E16.5_E1S1_GEM_bin1.tsv.gz \
#     --x-col x \
#     --y-col y \
#     --gene-col geneID \
#     --count-col MIDCounts \
#     --format tsv.gz \
#     --chunksize 1000000 \
#     --output mouse_E16.5_E1S1_raw_dnb.h5ad

In [2]:

Copied!





# Load your spatial transcriptomics datasets
E13_adata = ad.read_h5ad("/path/to/mouse_E13.5_E1S1_raw_dnb.h5ad")
E14_adata = ad.read_h5ad("/path/to/mouse_E14.5_E1S1_raw_dnb.h5ad")
E15_adata = ad.read_h5ad("/path/to/mouse_E15.5_E1S1_raw_dnb.h5ad")
E16_adata = ad.read_h5ad("/path/to/mouse_E16.5_E1S1_raw_dnb.h5ad")


# Create list of datasets and assign dataset names
adata_list = [E13_adata, E14_adata, E15_adata, E16_adata]
dataset_names = ["E13.5", "E14.5", "E15.5", "E16.5"]
# Load your spatial transcriptomics datasets
E13_adata = ad.read_h5ad("/path/to/mouse_E13.5_E1S1_raw_dnb.h5ad")
E14_adata = ad.read_h5ad("/path/to/mouse_E14.5_E1S1_raw_dnb.h5ad")
E15_adata = ad.read_h5ad("/path/to/mouse_E15.5_E1S1_raw_dnb.h5ad")
E16_adata = ad.read_h5ad("/path/to/mouse_E16.5_E1S1_raw_dnb.h5ad")


# Create list of datasets and assign dataset names
adata_list = [E13_adata, E14_adata, E15_adata, E16_adata]
dataset_names = ["E13.5", "E14.5", "E15.5", "E16.5"]

Quality Control¶

The table below shows suggested QC values given the resolution of your AnnData spatial coordinates.

Note: If you have cell-segmented imaging / fine resolution data, then you should use the "Cell-sized" QC settings. If you have true individual transcripts or fine resolution data, then you should use the Fine / Imaging QC settings.

Resolution	`min_counts (filter_cells)`	`min_counts (filter_genes)`	`min_cells (filter_genes)`
"Cell-sized" (8-10 µm)	10 - 100	100 - 500	10 - 100
Binned (20-50 µm)	50 - 500	100 - 500	10 - 100
Fine / Imaging (< 1-2 µm)	1	100 - 500	10 - 100

In [3]:

Copied!





# QC filtering for each dataset
for i in range(len(adata_list)):
    # Filter spots with low UMI counts
    sc.pp.filter_cells(adata_list[i], min_counts=1) # Only remove DNA nanoballs with 0 UMIs!!
    # Filter genes with low counts/unique spots
    sc.pp.filter_genes(adata_list[i], min_counts=100)
# QC filtering for each dataset
for i in range(len(adata_list)):
    # Filter spots with low UMI counts
    sc.pp.filter_cells(adata_list[i], min_counts=1) # Only remove DNA nanoballs with 0 UMIs!!
    # Filter genes with low counts/unique spots
    sc.pp.filter_genes(adata_list[i], min_counts=100)

Normalization¶

Normalization depends on the resolution of your AnnData spatial coordinates.

Note: If you have cell-segmented imaging / fine resolution data, then you should use the "Cell-sized" normalization method. If you have true individual transcripts or fine resolution data, then you should use the Fine / Imaging normalization method.

Resolution	Normalization Method
"Cell sized" or Binned Data (8-50 micron resolution)	CPT + Log1p normalization
Fine / Imaging (<1-2 micron resolution)	Log1p normalization only

In [4]:

Copied!





# Normalization for each dataset
for i in range(len(adata_list)):
    # # CHOOSE: CPT + Log1p normalization (8-50 micron resolution data)
    # sc.pp.normalize_total(adata_list[i], target_sum=1e3)
    # sc.pp.log1p(adata_list[i])
    
    # CHOOSE: Log1p normalization only (<1-2 micron resolution data)
    sc.pp.log1p(adata_list[i])
# Normalization for each dataset
for i in range(len(adata_list)):
    # # CHOOSE: CPT + Log1p normalization (8-50 micron resolution data)
    # sc.pp.normalize_total(adata_list[i], target_sum=1e3)
    # sc.pp.log1p(adata_list[i])
    
    # CHOOSE: Log1p normalization only (<1-2 micron resolution data)
    sc.pp.log1p(adata_list[i])

Step 1: Run Gaussian Smoothing¶

Here, we run Gaussian smoothing on the spatial dataset address data sparsity and noise. Key parameters for function smoothie.run_parallelized_smoothing:

gaussian_sd: float

The standard deviation for the Gaussian distribution, which determines smoothing degree. Generally across high-resolution S.T. platforms, a gaussian_sd corresponding to 20-30 microns is ideal.
gaussian_sd is calculated as target_microns * micron_to_unit_conversion
- For Slide-seq, micron_to_unit_conversion = 1.5456 (1 micron equals 1.5456 Slide-seq spatial units)
- For Stereo-seq, micron_to_unit_conversion = 2 (1 micron equals 2 Stereo-seq spatial units)
- A table of micron_to_unit_conversion values for spatial transcriptomics platforms is included in /guides/micron_to_unit_conversion_table.md

grid_based_or_not: boolean

True = grid-based smoothing: smooth only at imposed hexagonal grid points. Good for subcellular to "cell-sized" resolution data (<1-10 micron).
False = in-place smoothing: smooth at every spatial location in the dataset. Good for "cell-sized" to binned resolution data (10-50 micron).

min_spots_under_gaussian

Minimum required number of data points within radius (3 * gaussian_sd) for smoothing to occur at a given location.
Default is 10-100. Raise value if low density regions around the perimeter of tissue appear noisy in modules. Lower value if you have large spot size (e.g. original Visium 100 micron resolution data or binned 20-50 micron resolution).

In [5]:

Copied!





# Determine gaussian_sd ---------------------------------------------------
target_microns = 20 # 20-30 microns is ideal for smoothing
micron_to_unit_conversion = 2 # IMPORTANT! For Stereo-seq, this value is 2.
# Select other paramters --------------------------------------------------
grid_based_or_not = True  # Select True or False
min_spots_under_gaussian = 100 # Adjust as needed
# -------------------------------------------------------------------------

sm_adata_list = []
for i in range(len(adata_list)):
    print(f'Smoothing dataset: {dataset_names[i]}')
    sm_adata = smoothie.run_parallelized_smoothing(
        adata_list[i],
        grid_based_or_not=grid_based_or_not,
        gaussian_sd=target_microns * micron_to_unit_conversion,
        min_spots_under_gaussian=min_spots_under_gaussian
    )
    sm_adata_list.append(sm_adata)
# Determine gaussian_sd ---------------------------------------------------
target_microns = 20 # 20-30 microns is ideal for smoothing
micron_to_unit_conversion = 2 # IMPORTANT! For Stereo-seq, this value is 2.
# Select other paramters --------------------------------------------------
grid_based_or_not = True  # Select True or False
min_spots_under_gaussian = 100 # Adjust as needed
# -------------------------------------------------------------------------

sm_adata_list = []
for i in range(len(adata_list)):
    print(f'Smoothing dataset: {dataset_names[i]}')
    sm_adata = smoothie.run_parallelized_smoothing(
        adata_list[i],
        grid_based_or_not=grid_based_or_not,
        gaussian_sd=target_microns * micron_to_unit_conversion,
        min_spots_under_gaussian=min_spots_under_gaussian
    )
    sm_adata_list.append(sm_adata)

Smoothing dataset: E13.5
Checking dataset point density...
Fitting grid to tissue coordinates...
Variation in X-direction: 18399
Variation in Y-direction: 14799
Number of points in fitted grid: 138467
Gaussian smoothing will run in 36 chunks.
Smoothing chunk: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
Total runtime for grid-based Gaussian smoothing: 696.46 seconds.

Smoothing dataset: E14.5
Checking dataset point density...
Fitting grid to tissue coordinates...
Variation in X-direction: 22749
Variation in Y-direction: 15649
Number of points in fitted grid: 183814
Gaussian smoothing will run in 36 chunks.
Smoothing chunk: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
Total runtime for grid-based Gaussian smoothing: 859.55 seconds.

Smoothing dataset: E15.5
Checking dataset point density...
Fitting grid to tissue coordinates...
Variation in X-direction: 23449
Variation in Y-direction: 15099
Number of points in fitted grid: 203215
Gaussian smoothing will run in 36 chunks.
Smoothing chunk: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
Total runtime for grid-based Gaussian smoothing: 825.06 seconds.

Smoothing dataset: E16.5
Checking dataset point density...
Fitting grid to tissue coordinates...
Variation in X-direction: 25549
Variation in Y-direction: 17467
Number of points in fitted grid: 218394
Gaussian smoothing will run in 36 chunks.
Smoothing chunk: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
Total runtime for grid-based Gaussian smoothing: 875.32 seconds.

Step 2: Measure and Cluster Gene Co-expression Network Across Samples¶

To measure multi-dataset spatial gene correlations, we concatenate the smoothed spatial datasets into one large count matrix. We then compute the pairwise Pearson correlation coefficient between all multi-dataset smoothed gene surface pairs.

Next, we evaluate optimal parameter choices for network construction and clustering. The function smoothie.select_clustering_params calculates network clustering statistics across different choices for pcc_cutoff and clustering_power.

Choose values for pcc_cutoff and clustering_power that balance the following goals:

High gene margin scores for high gene module confidence
Stable number of modules (n_clusters) compared to other pcc_cutoff values
High gene recall for more gene-rich analyses

Refer to /docs/tutorials for examples on selecting these parameters.

In [6]:

Copied!

# Create a concatenated smoothed count matrix X_concat, with gene names for each column
sm_adata_X_concat, gene_names_concat = smoothie.concatenate_smoothed_matrices(sm_adata_list)

# Compute Pearson correlation matrix (This step can take a while... depending on sm_adata_X_concat size)
pearsonR_mat_concat, _ = smoothie.compute_correlation_matrix(sm_adata_X_concat)
# Create a concatenated smoothed count matrix X_concat, with gene names for each column
sm_adata_X_concat, gene_names_concat = smoothie.concatenate_smoothed_matrices(sm_adata_list)

# Compute Pearson correlation matrix (This step can take a while... depending on sm_adata_X_concat size)
pearsonR_mat_concat, _ = smoothie.compute_correlation_matrix(sm_adata_X_concat)

concatenate_smoothed_matrices running...
compute_correlation_matrix running...
Total runtime for compute_correlation_matrix: 5854.41 seconds.

In [7]:

Copied!





# Clustering hyperparameter selection
smoothie.select_clustering_params(
    gene_names=gene_names_concat, # don't change
    pearsonR_mat=pearsonR_mat_concat, # don't change
    output_folder=None,
    pcc_cutoffs=[0.3, 0.4, 0.5, 0.6, 0.7, 0.8], # lower pcc_cutoffs require more runtime
    clustering_powers=[1, 3, 5, 7, 9],
    min_genes_for_module=5
)
# Clustering hyperparameter selection
smoothie.select_clustering_params(
    gene_names=gene_names_concat, # don't change
    pearsonR_mat=pearsonR_mat_concat, # don't change
    output_folder=None,
    pcc_cutoffs=[0.3, 0.4, 0.5, 0.6, 0.7, 0.8], # lower pcc_cutoffs require more runtime
    clustering_powers=[1, 3, 5, 7, 9],
    min_genes_for_module=5
)

Running network clustering with parameters:
PCC hard cutoff: 0.3, clustering soft power: 1, 3, 5, 7, 9, 
PCC hard cutoff: 0.4, clustering soft power: 1, 3, 5, 7, 9, 
PCC hard cutoff: 0.5, clustering soft power: 1, 3, 5, 7, 9, 
PCC hard cutoff: 0.6, clustering soft power: 1, 3, 5, 7, 9, 
PCC hard cutoff: 0.7, clustering soft power: 1, 3, 5, 7, 9, 
PCC hard cutoff: 0.8, clustering soft power: 1, 3, 5, 7, 9,

No description has been provided for this image

Optionally, data shuffling and smoothing can be done to help determine the lower bound for pcc_cutoff selection. Here, use the same parameters as before in smoothie.run_parallelized_smoothing.

Note: Creating the smoothed, shuffled versions of the datasets will double memory usage temporarily.

In [ ]:

Copied!





# p95, p99, p999 = smoothie.compute_shuffled_correlation_percentiles(
#     adata_list,
#     grid_based_or_not=True, # Choose same value as in smoothie.run_parallelized_smoothing
#     gaussian_sd=target_microns * micron_to_unit_conversion, # Choose same value as in smoothie.run_parallelized_smoothing
#     min_spots_under_gaussian=100, # Choose same value as in smoothie.run_parallelized_smoothing
#     seed=0
# )
# p95, p99, p999 = smoothie.compute_shuffled_correlation_percentiles(
#     adata_list,
#     grid_based_or_not=True, # Choose same value as in smoothie.run_parallelized_smoothing
#     gaussian_sd=target_microns * micron_to_unit_conversion, # Choose same value as in smoothie.run_parallelized_smoothing
#     min_spots_under_gaussian=100, # Choose same value as in smoothie.run_parallelized_smoothing
#     seed=0
# )

Based on the outputs above, we can select pcc_cutoff and clustering_power hyperparameters for smoothie.make_spatial_network.

pcc_cutoff: The Pearson correlation coefficient (PCC) hard threshold for network construction. Only correlations above this value are retained in the gene network. Higher cutoff values result in smaller, stronger average correlation networks.

clustering_power: A soft thresholding parameter that controls how much weaker PCC values are suppressed. Higher values favor more modules.

In [8]:

Copied!





# Build network and cluster genes into modules
edge_list, node_label_df = smoothie.make_spatial_network(
    pearsonR_mat=pearsonR_mat_concat, # don't change
    gene_names=gene_names_concat, # don't change
    pcc_cutoff=0.4,
    clustering_power=4,
    output_folder='./out'
)

# Filter node_label_df for only genes within modules of size 2 or more.
modules_df = node_label_df.groupby('module_label').filter(lambda x: len(x) >= 2)

# Examine spatial gene modules
modules_df
# Build network and cluster genes into modules
edge_list, node_label_df = smoothie.make_spatial_network(
    pearsonR_mat=pearsonR_mat_concat, # don't change
    gene_names=gene_names_concat, # don't change
    pcc_cutoff=0.4,
    clustering_power=4,
    output_folder='./out'
)

# Filter node_label_df for only genes within modules of size 2 or more.
modules_df = node_label_df.groupby('module_label').filter(lambda x: len(x) >= 2)

# Examine spatial gene modules
modules_df

Out[8]:

	name	module_label	degree	weighted_degree	rescaled_weighted_degree	clustering_coeff	margin_score
0	Rps17	1	1722	937.232806	61.987632	0.322454	0.766591
1	Rps5	1	1707	934.802515	60.649651	0.328016	0.813370
2	Rps26	1	1695	919.585437	59.695018	0.326045	0.776880
3	Rpl11	1	1742	945.924169	59.477755	0.316680	0.785971
4	Rps24	1	1577	860.135441	59.366024	0.369837	0.826953
...	...	...	...	...	...	...	...
6411	Hmga2	296	3	1.251878	0.000004	1.000000	0.280330
6412	Rbp1	297	52	21.813263	0.000690	0.960030	0.949203
6413	Aldh1a2	297	1	0.470751	0.000193	0.000000	1.000000
6414	Hmga1	298	46	19.375366	0.002022	0.957488	0.994935
6415	Hmga1b	298	2	0.938142	0.001647	1.000000	0.999582

6416 rows × 7 columns

Finally, we can import the saved edge_list.csv and node_labels.csv files into Cytoscape for interactive network visualization! More info in /guides/cytoscape_instructions.md

Visualize Smoothed Gene Expression and Modules¶

In [9]:

Copied!





# rotate sm_adata as needed
sm_adata_list[0] = smoothie.rotate_spatial(sm_adata_list[0], angle_degrees=-90, flip_x=True)
sm_adata_list[1] = smoothie.rotate_spatial(sm_adata_list[1], angle_degrees=-90)
sm_adata_list[2] = smoothie.rotate_spatial(sm_adata_list[2], angle_degrees=90)
sm_adata_list[3] = smoothie.rotate_spatial(sm_adata_list[3], angle_degrees=-90)
# rotate sm_adata as needed
sm_adata_list[0] = smoothie.rotate_spatial(sm_adata_list[0], angle_degrees=-90, flip_x=True)
sm_adata_list[1] = smoothie.rotate_spatial(sm_adata_list[1], angle_degrees=-90)
sm_adata_list[2] = smoothie.rotate_spatial(sm_adata_list[2], angle_degrees=90)
sm_adata_list[3] = smoothie.rotate_spatial(sm_adata_list[3], angle_degrees=-90)

In [10]:

Copied!





# Plot a gene of interest across datasets
smoothie.plot_gene_multisample(
    sm_adata_list, 
    dataset_names, 
    gene_name='Upk3b', # marker gene for mesothelial cell monolayer!
    output_folder='./gene_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50, # Adjust for optimal visualization
    dpi=600
)
# Plot a gene of interest across datasets
smoothie.plot_gene_multisample(
    sm_adata_list, 
    dataset_names, 
    gene_name='Upk3b', # marker gene for mesothelial cell monolayer!
    output_folder='./gene_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50, # Adjust for optimal visualization
    dpi=600
)

In [11]:

Copied!





# Plot gene modules across datasets
smoothie.plot_modules_multisample(
    sm_adata_list, 
    dataset_names, 
    node_label_df,
    output_folder='./module_plots',
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for module plotting
    min_genes=15, # only showing large modules for github memory's sake. Choose 2 or 3 to see more!
    spot_size=50, # Adjust for optimal visualization
    dpi=150 # low resolution here for github, better to choose 300-600!
)
# Plot gene modules across datasets
smoothie.plot_modules_multisample(
    sm_adata_list, 
    dataset_names, 
    node_label_df,
    output_folder='./module_plots',
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for module plotting
    min_genes=15, # only showing large modules for github memory's sake. Choose 2 or 3 to see more!
    spot_size=50, # Adjust for optimal visualization
    dpi=150 # low resolution here for github, better to choose 300-600!
)

Module 1:

Module 2:

$No description has been provided for this image$

Module 3:

Module 4:

Module 5:

Module 6:

Module 7:

Module 8:

Module 9:

Module 10:

Module 11:

Module 12:

Module 13:

Module 14:

Module 15:

Module 16:

Module 17:

Module 18:

Module 19:

Module 20:

Module 21:

Module 22:

Module 23:

Module 24:

Module 25:

Module 26:

Module 27:

Module 28:

Module 29:

Module 30:

Module 31:

Step 3: Compare Gene Patterns Between Datasets (2nd-order Correlation Analysis)¶

In this section, we assess stable vs dynamic gene patterns between datasets (without having to warp/transform/align the mismatched spatial coordinates!)

We construct spatial gene embeddings for each dataset, where a gene's embedding is comprised of its correlations with other genes within the dataset. These gene embedding features are then aligned across datasets for comparison.

The gene stability metric quantifies how consistently genes co-vary with the same partners across datasets. Values range from -1 to 1, where higher values indicate more stable/conserved co-expression patterns. A gene stability of exactly -1.0 designates that a gene was missing in one dataset but had a significant correlation above pcc_cutoff in the other dataset.

This analysis is wrapped within the function smoothie.run_second_order_correlation_analysis. More information is found in the API documentation.

In [13]:

Copied!





results = smoothie.run_second_order_correlation_analysis(
    sm_adata_list=sm_adata_list,
    pcc_cutoff=0.4, # Use same cutoff as network
    node_label_df=node_label_df, # Module assignments from network
    E_max=25, # Max features per module for embeddings. 
    output_folder='./2nd_order_corr_out',
    seed=0
)
print(f"Gene embeddings shape: {results['gene_embeddings_tensor'].shape}")
print(f"Gene stabilities shape: {results['gene_stabilities'].shape}")
results = smoothie.run_second_order_correlation_analysis(
    sm_adata_list=sm_adata_list,
    pcc_cutoff=0.4, # Use same cutoff as network
    node_label_df=node_label_df, # Module assignments from network
    E_max=25, # Max features per module for embeddings. 
    output_folder='./2nd_order_corr_out',
    seed=0
)
print(f"Gene embeddings shape: {results['gene_embeddings_tensor'].shape}")
print(f"Gene stabilities shape: {results['gene_stabilities'].shape}")

Starting second-order correlation analysis pipeline...
Number of datasets: 4

============================================================
[Step 1/5] Computing Pearson correlation matrices for each dataset...

Computing correlations for dataset 1/4 (19274 genes, 138467 spots)...
compute_correlation_matrix running...
Total runtime for compute_correlation_matrix: 929.68 seconds.


Computing correlations for dataset 2/4 (20602 genes, 183798 spots)...
compute_correlation_matrix running...
Total runtime for compute_correlation_matrix: 1391.47 seconds.


Computing correlations for dataset 3/4 (20160 genes, 203215 spots)...
compute_correlation_matrix running...
Total runtime for compute_correlation_matrix: 1467.41 seconds.


Computing correlations for dataset 4/4 (20203 genes, 218394 spots)...
compute_correlation_matrix running...
Total runtime for compute_correlation_matrix: 1579.93 seconds.


============================================================
[Step 2/5] Aligning gene correlation matrices across datasets...
Aligned tensor shape: (21290, 21290, 4)

============================================================
[Step 3/5] Creating gene embeddings (PCC cutoff=0.4, E_max=25)...
Gene embeddings shape: (7438, 3018, 4)

============================================================
[Step 4/5] Computing gene stability metrics across dataset pairs...
Stability tensor shape: (4, 4, 7438)

============================================================
[Step 5/5] Saving results to ./2nd_order_corr_out...
Second-order correlation analysis complete.
Gene embeddings shape: (7438, 3018, 4)
Gene stabilities shape: (4, 4, 7438)

Examine Gene Stabilities Between Datasets

In [14]:

Copied!





smoothie.plot_dataset_pair_stabilities(
    gene_stabilities=results['gene_stabilities'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder=None
)
smoothie.plot_dataset_pair_stabilities(
    gene_stabilities=results['gene_stabilities'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder=None
)

Gene stabilities of -1 are not shown for the violin plot pairwise dataset stabilities.

Rank Gene Patterns from Stable to Dynamic Across All Datasets

In [15]:

Copied!





# Look at distribution of minimum stability for each gene across dataset pairs
stability_df = smoothie.plot_gene_stability_distribution(
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    node_label_df=node_label_df, # don't change
    output_folder=None
)
# Look at distribution of minimum stability for each gene across dataset pairs
stability_df = smoothie.plot_gene_stability_distribution(
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    node_label_df=node_label_df, # don't change
    output_folder=None
)

Plot a Gene and its Stability Between Datasets

In [16]:

Copied!





GOI = 'Alb' # select your gene of interest

# Plot gene spatial expression
smoothie.plot_gene_multisample(
    sm_adata_list, # don't change
    dataset_names, # don't change
    gene_name=GOI,
    output_folder='./gene_stability_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50 # Adjust for optimal visualization
)
# Plot gene stabilities
smoothie.plot_gene_stability(
    gene_name=GOI, 
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./gene_stability_plots'
)
GOI = 'Alb' # select your gene of interest

# Plot gene spatial expression
smoothie.plot_gene_multisample(
    sm_adata_list, # don't change
    dataset_names, # don't change
    gene_name=GOI,
    output_folder='./gene_stability_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50 # Adjust for optimal visualization
)
# Plot gene stabilities
smoothie.plot_gene_stability(
    gene_name=GOI, 
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./gene_stability_plots'
)

In [17]:

Copied!





GOI = 'Mkrn1' # select your gene of interest

# Plot gene spatial expression
smoothie.plot_gene_multisample(
    sm_adata_list, # don't change
    dataset_names, # don't change
    gene_name=GOI,
    output_folder='./gene_stability_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50 # Adjust for optimal visualization
)
# Plot gene stabilities
smoothie.plot_gene_stability(
    gene_name=GOI, 
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./gene_stability_plots'
)
GOI = 'Mkrn1' # select your gene of interest

# Plot gene spatial expression
smoothie.plot_gene_multisample(
    sm_adata_list, # don't change
    dataset_names, # don't change
    gene_name=GOI,
    output_folder='./gene_stability_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50 # Adjust for optimal visualization
)
# Plot gene stabilities
smoothie.plot_gene_stability(
    gene_name=GOI, 
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./gene_stability_plots'
)

In [18]:

Copied!





GOI = 'Gzma' # select your gene of interest

# Plot gene spatial expression
smoothie.plot_gene_multisample(
    sm_adata_list, # don't change
    dataset_names, # don't change
    gene_name=GOI,
    output_folder='./gene_stability_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50 # Adjust for optimal visualization
)
# Plot gene stabilities
smoothie.plot_gene_stability(
    gene_name=GOI, 
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./gene_stability_plots'
)
GOI = 'Gzma' # select your gene of interest

# Plot gene spatial expression
smoothie.plot_gene_multisample(
    sm_adata_list, # don't change
    dataset_names, # don't change
    gene_name=GOI,
    output_folder='./gene_stability_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50 # Adjust for optimal visualization
)
# Plot gene stabilities
smoothie.plot_gene_stability(
    gene_name=GOI, 
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./gene_stability_plots'
)

In [19]:

Copied!





GOI = 'Lor' # select your gene of interest

# Plot gene spatial expression
smoothie.plot_gene_multisample(
    sm_adata_list, # don't change
    dataset_names, # don't change
    gene_name=GOI,
    output_folder='./gene_stability_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50 # Adjust for optimal visualization
)
# Plot gene stabilities
smoothie.plot_gene_stability(
    gene_name=GOI, 
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./gene_stability_plots'
)
GOI = 'Lor' # select your gene of interest

# Plot gene spatial expression
smoothie.plot_gene_multisample(
    sm_adata_list, # don't change
    dataset_names, # don't change
    gene_name=GOI,
    output_folder='./gene_stability_plots', 
    shared_scaling=False, # Dataset independent scaling or shared 0-to-max scaling for gene plotting
    spot_size=50 # Adjust for optimal visualization
)
# Plot gene stabilities
smoothie.plot_gene_stability(
    gene_name=GOI, 
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./gene_stability_plots'
)

Plot a Gene Module's Average Stability Between Datasets

In [ ]:

Copied!





smoothie.plot_module_stability(
    module_label = 5, # Pick module label of interest
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    node_label_df=node_label_df, # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./module_stability_plots'
)
smoothie.plot_module_stability(
    module_label = 5, # Pick module label of interest
    gene_stabilities=results['gene_stabilities'], # don't change
    robust_gene_names=results['robust_gene_names'], # don't change
    node_label_df=node_label_df, # don't change
    dataset_names=dataset_names, # don't change
    output_folder='./module_stability_plots'
)