Smoothie API Reference
gaussian_smoothing.py
run_parallelized_smoothing
run_parallelized_smoothing(adata, grid_based_or_not, gaussian_sd, min_spots_under_gaussian,
stride=None, grid_fitting_dist=None, num_processes=10, num_data_splits=None)
Performs parallelized Gaussian smoothing on spatial transcriptomics data.
Parameters:
| Parameter | Type | Description |
|---|---|---|
adata |
AnnData | The input dataset. |
grid_based_or_not |
bool | True = grid-based smoothing (smooth only at hexagonal grid points); False = in-place smoothing (smooth at every spatial location). In-place is effective for cell-sized or larger resolution (10–50 µm). Grid-based is effective for subcellular to cell-sized resolution (0.5–10 µm). |
gaussian_sd |
float | Standard deviation for the Gaussian kernel in coordinate units. Use target_microns × micron_to_unit_conversion. Target of 20–40 µm is generally ideal. See docs/guides/micron_to_unit_conversion_table.md. |
min_spots_under_gaussian |
int | Minimum number of data points within radius 3 × gaussian_sd required for smoothing to occur at a location (typical range: 25–100). |
stride |
float, optional | Stride value for grid-based smoothing. Default is 1 × gaussian_sd; 0.5 × gaussian_sd allows a denser grid. |
grid_fitting_dist |
float, optional | Minimum distance a grid point must be from a spatial coordinate to be retained. Default is 0.25 × gaussian_sd. |
num_processes |
int, optional | Number of parallel processes to use. Default is 10. |
num_data_splits |
int, optional | Number of data chunks for memory-efficient processing. If unspecified, automatically selected. |
Returns: sm_adata (AnnData) — The smoothed AnnData object.
spatial_correlation.py
compute_correlation_matrix
compute_correlation_matrix(X)
Computes pairwise Pearson correlation coefficients between all genes.
Parameters:
| Parameter | Type | Description |
|---|---|---|
X |
np.ndarray | An (N × G) matrix where rows are spatial points and columns are genes. Should be smoothed gene expression data. |
Returns: (pearsonR_mat, p_val_mat) — Tuple of two (G × G) matrices: Pearson correlation coefficients and corresponding p-values.
get_correlations_to_GOI
get_correlations_to_GOI(pearsonR_mat, gene_names, GOI, reverse_order=False, plot_histogram=True)
Retrieves and ranks the correlation of all genes with a specified gene of interest (GOI).
Parameters:
| Parameter | Type | Description |
|---|---|---|
pearsonR_mat |
np.ndarray | A (G × G) Pearson correlation coefficient matrix. |
gene_names |
list, np.ndarray, or pd.Index | Gene names corresponding to matrix indices. |
GOI |
str | The gene of interest for which correlations are ranked. |
reverse_order |
bool, optional | If True, sorts correlations in ascending order. Default is False (descending). |
plot_histogram |
bool, optional | If True, plots a histogram of correlation values to GOI. Default is True. |
Returns: (G × 2) np.ndarray — First column contains gene names, second column contains correlation values with the GOI, sorted by correlation strength.
network_analysis.py
make_spatial_network
make_spatial_network(pearsonR_mat, gene_names, pcc_cutoff, clustering_power,
output_folder=None, save_file_prefix="", gene_labels_list=None,
gene_labels_names=None, trials=20, random_seed=0)
Generates a spatial co-expression network with hard thresholding (pcc_cutoff) and soft power transformation (clustering_power), then performs Infomap clustering and calculates network metrics.
Parameters:
| Parameter | Type | Description |
|---|---|---|
pearsonR_mat |
np.ndarray | Square matrix of Pearson correlation coefficients between genes. |
gene_names |
list or pd.Index | Gene names corresponding to rows/columns of the correlation matrix. |
pcc_cutoff |
float | Threshold for the Pearson correlation coefficient; only correlations above this value are retained. |
clustering_power |
float | Soft power that controls rescaling of PCC values. Higher values favor more gene modules. |
output_folder |
str, optional | Directory where output files are saved. If None, no files are saved. Default is None. |
save_file_prefix |
str, optional | String prefix added to output filenames. |
gene_labels_list |
list of list/tuple/np.ndarray, optional | Gene set labels for visualization. Each item must have the same length as the number of genes. |
gene_labels_names |
list, optional | Names for the gene sets in gene_labels_list. |
trials |
int, optional | Number of Infomap clustering trials. Default is 20. |
random_seed |
int, optional | Random seed for the Infomap clustering algorithm. Default is 0. |
Returns: (edge_list, node_label_df) — Edge list in the format [gene1, gene2, PCC, Rescaled_PCC], and a DataFrame with gene names, module labels, and network metrics.
make_geneset_spatial_network
make_geneset_spatial_network(pearsonR_mat, gene_names, node_label_df, gene_list,
low_pcc_cutoff, output_folder=None, intra_geneset_edges_only=False)
Given a clustered network, constructs a new network with a lower PCC cutoff applied to a provided gene set.
Parameters:
| Parameter | Type | Description |
|---|---|---|
pearsonR_mat |
np.ndarray | Pearson correlation coefficient matrix. |
gene_names |
list | Gene names corresponding to rows/columns of pearsonR_mat. |
node_label_df |
pd.DataFrame | DataFrame with gene names and module labels from make_spatial_network. |
gene_list |
list | Genes to retain in the subset network. |
low_pcc_cutoff |
float | Minimum PCC required to include an edge. Must be ≤ the pcc_cutoff used to generate node_label_df. |
output_folder |
str, optional | Directory where output files are saved. If None, no files are saved. Default is None. |
intra_geneset_edges_only |
bool, optional | If True, only includes edges where both nodes are in gene_list. Default is False. |
Returns: (geneset_edge_list, geneset_node_label_df) — Edge list in format [gene1, gene2, PCC], and a DataFrame with updated module labels including column weak_module_label for gene_list genes that have a correlation above low_pcc_cutoff with another gene in the previously clustered network (node_label_df modules).
multi_sample_integration.py
concatenate_smoothed_matrices
concatenate_smoothed_matrices(sm_adata_list)
Concatenates smoothed count matrices from multiple datasets, aligning gene columns across datasets.
Parameters:
| Parameter | Type | Description |
|---|---|---|
sm_adata_list |
list of AnnData | AnnData objects with smoothed count matrices. |
Returns: (X_concat, gene_names) — Concatenated (N × G) matrix with aligned genes, and the corresponding list of gene names.
run_second_order_correlation_analysis
run_second_order_correlation_analysis(sm_adata_list, pcc_cutoff, node_label_df=None,
E_max=25, output_folder=None, seed=0)
Comprehensive function for second-order correlation gene embedding analysis across multiple samples. Workflow: (1) aligns correlation matrices, (2) creates gene embeddings, (3) computes stability metrics, and (4) adds stability values to node_label_df. Check this function's Returns section for information on output.
Parameters:
| Parameter | Type | Description |
|---|---|---|
sm_adata_list |
list of AnnData | Smoothed AnnData objects from different spatial transcriptomics samples. Each must have gene expression data in .X and gene names in .var_names. |
pcc_cutoff |
float | Minimum PCC threshold. Genes must exceed this value in at least one dataset to be included in embedding analysis. Typical values: 0.4–0.5. |
node_label_df |
pd.DataFrame, optional | Gene module assignments with columns 'name' and 'module_label'. If provided, embedding features are balanced across modules and stability metrics are added as a 'gene_stabilities' column. Default is None. |
E_max |
int, optional | Maximum number of genes per module used for embedding features. Controls embedding dimensionality and balances representation across modules. Typical values: 15–50. Default is 25. |
output_folder |
str, optional | Path to folder where results are saved. If None, files are not saved. Default is None. |
seed |
int, optional | Random seed for reproducibility when downsampling embedding features. Default is 0. |
Returns: dict with keys:
'gene_embeddings_tensor'(np.ndarray): Shape(n_genes, n_features, n_datasets). Each gene's embedding is a vector of Fisher Z-transformed correlation values with other genes (features).'robust_gene_names'(np.ndarray): Shape(n_genes). These are the genes that passed thepcc_cutoffcorrelation threshold in at least one dataset in sm_adata_list.'gene_stabilities'(np.ndarray): Shape(n_datasets, n_datasets, n_genes)Tensor of gene stabilities across dataset pairs. Values range from -1 to 1, where higher values indicate more stable/conserved co-expression patterns. A stability of -1 designates that a gene was missing in one of the datasets.'aligned_pearsonR_tensor'(np.ndarray): Shape(n_all_genes, n_all_genes, n_datasets). Contains aligned Pearson correlation matrices across all datasets.'node_label_df'(pd.DataFrame or None): Inputnode_label_dfwith'gene_stabilities'column added, orNone.
plot_dataset_pair_stabilities
plot_dataset_pair_stabilities(gene_stabilities, dataset_names, output_folder=None,
figsize=None, fontsize=5, dpi=300, file_format='png')
Visualizes the distribution of gene stability between all dataset pairs using violin plots. See run_second_order_correlation_analysis for more info.
Parameters:
| Parameter | Type | Description |
|---|---|---|
gene_stabilities |
np.ndarray | Stabilities tensor of shape (n_datasets, n_datasets, n_genes) from run_second_order_correlation_analysis. |
dataset_names |
list of str | Dataset names for axis labels. |
output_folder |
str, optional | Directory to save the plot. If None, does not save. |
figsize |
tuple, optional | Figure size in inches. If None, auto-calculated based on number of pairs. |
fontsize |
int, optional | Font size for labels and titles. Default is 5. |
dpi |
int, optional | Resolution for saved figure. Default is 300. |
file_format |
str, optional | 'png' or 'pdf'. Default is 'png'. |
Returns: None.
plot_gene_stability_distribution
plot_gene_stability_distribution(gene_stabilities, robust_gene_names, node_label_df=None,
output_folder=None, figsize=(3,2), fontsize=6,
bins=50, dpi=300, file_format='png')
Plots the distribution of per gene minimum stabilities across all dataset pairs and returns a sorted dataframe of minimum gene stabilities. See run_second_order_correlation_analysis for more info.
Parameters:
| Parameter | Type | Description |
|---|---|---|
gene_stabilities |
np.ndarray | Stabilities tensor of shape (n_datasets, n_datasets, n_genes). |
robust_gene_names |
np.ndarray | Gene names corresponding to the third dimension of gene_stabilities. |
node_label_df |
pd.DataFrame, optional | Node labels DataFrame with 'name' and 'module_label' columns. If provided, module labels are included in output. |
output_folder |
str, optional | Directory to save the plot. If None, does not save. |
figsize |
tuple, optional | Figure size in inches. Default is (3, 2). |
fontsize |
int, optional | Font size for axes labels. Default is 6. |
bins |
int, optional | Number of histogram bins. Default is 50. |
dpi |
int, optional | Resolution for saved figure. Default is 300. |
file_format |
str, optional | 'png' or 'pdf'. Default is 'png'. |
Returns: pd.DataFrame with columns ['gene', 'module', 'min_stability'] sorted from most to least stable. If node_label_df is not provided, 'module' column will be NaN.
plot_gene_stability
plot_gene_stability(dataset_names, gene_name, gene_stabilities, robust_gene_names,
output_folder=None, figsize=(1.5, 1.5), fontsize=5,
dpi=300, file_format='png', x_ticks=True, cbar=True)
Plots a compact stability heatmap for a specific gene showing pairwise dataset correlations. See run_second_order_correlation_analysis for more info.
Parameters:
| Parameter | Type | Description |
|---|---|---|
dataset_names |
list of str | Dataset names for axis labels. |
gene_name |
str | Name of the gene to visualize. |
gene_stabilities |
np.ndarray | Stabilities tensor of shape (n_datasets, n_datasets, n_genes). |
robust_gene_names |
np.ndarray | Gene names corresponding to the third dimension of gene_stabilities. |
output_folder |
str, optional | Directory to save the plot. If None, does not save. |
figsize |
tuple, optional | Figure size in inches. Default is (1.5, 1.5). |
fontsize |
int, optional | Font size for title and tick labels. Default is 5. |
dpi |
int, optional | Resolution for saved figure. Default is 300. |
file_format |
str, optional | 'png' or 'pdf'. Default is 'png'. |
x_ticks |
bool, optional | Whether to display x-axis dataset labels. Default is True. |
cbar |
bool, optional | Whether to display a colorbar. Default is True. |
Returns: None.
plot_module_stability
plot_module_stability(module_label, gene_stabilities, robust_gene_names, node_label_df,
dataset_names, output_folder=None, figsize=(1.5, 1.5), fontsize=5,
dpi=300, file_format='png', x_ticks=False, cbar=True)
Plots a compact stability heatmap for a module showing average pairwise dataset correlations across all genes in the module. See run_second_order_correlation_analysis for more info.
Parameters:
| Parameter | Type | Description |
|---|---|---|
module_label |
int or str | Module identifier from node_label_df. |
gene_stabilities |
np.ndarray | Stabilities tensor of shape (n_datasets, n_datasets, n_genes). |
robust_gene_names |
np.ndarray | Gene names corresponding to the third dimension of gene_stabilities. |
node_label_df |
pd.DataFrame | DataFrame with 'name' and 'module_label' columns. |
dataset_names |
list of str | Dataset names for axis labels. |
output_folder |
str, optional | Directory to save the plot. If None, does not save. |
figsize |
tuple, optional | Figure size in inches. Default is (1.5, 1.5). |
fontsize |
int, optional | Font size for title and labels. Default is 5. |
dpi |
int, optional | Resolution for saved figure. Default is 300. |
file_format |
str, optional | 'png' or 'pdf'. Default is 'png'. |
x_ticks |
bool, optional | Whether to display x-axis dataset labels. Default is False. |
cbar |
bool, optional | Whether to display a colorbar. Default is True. |
Returns: None.
choosing_hyperparameters.py
select_clustering_params
select_clustering_params(gene_names, pearsonR_mat, permuted_pcc_999=None, output_folder=None,
pcc_cutoffs=[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
clustering_powers=[1, 3, 5, 7, 9], min_genes_for_module=3,
infomap_clustering_trials=1, full_metrics=False)
Evaluates combinations of PCC cutoff and clustering power hyperparameters for spatial network construction, generating diagnostic plots of clustering quality metrics to guide hyperparameter selection.
Parameters:
| Parameter | Type | Description |
|---|---|---|
gene_names |
list, np.ndarray, or pd.Index | Gene names corresponding to rows/columns of pearsonR_mat. |
pearsonR_mat |
np.ndarray | Square matrix of Pearson correlation coefficients between genes. |
permuted_pcc_999 |
float, optional | The 99.9th percentile PCC from a shuffled-data null distribution (output of compute_shuffled_correlation_percentiles). If provided, appended to pcc_cutoffs as an additional cutoff to evaluate. Default is None. |
output_folder |
str, optional | Directory where output plots and CSV results are saved. If None, plots are displayed but not saved. Default is None. |
pcc_cutoffs |
list of float, optional | PCC hard-threshold cutoff values to evaluate. Default is [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]. |
clustering_powers |
list of int or float, optional | Soft-power exponents to evaluate for rescaling PCC edge weights. Default is [1, 3, 5, 7, 9]. |
min_genes_for_module |
int, optional | Minimum number of genes required for a cluster to be counted as a module. Default is 3. |
infomap_clustering_trials |
int, optional | Number of Infomap algorithm trials per hyperparameter combination. More trials improve stability. Default is 1 for runtime efficiency. |
full_metrics |
bool, optional | If True, plots all metrics (mean_gene_margin, fraction_margin_positive, modularity, n_clusters, n_genes_included). If False, plots only mean_gene_margin, n_clusters, and n_genes_included. Default is False. |
Returns: None. Displays diagnostic plots and optionally saves them along with a results CSV to output_folder.
plotting.py
rotate_spatial
rotate_spatial(adata, angle_degrees=0, flip_x=False, flip_y=False,
spatial_key='spatial', center=True)
Flips and rotates the spatial coordinates in an AnnData object.
Parameters:
| Parameter | Type | Description |
|---|---|---|
adata |
AnnData | The AnnData object to modify. |
angle_degrees |
float, optional | Angle to rotate in degrees (counter-clockwise). Default is 0. |
flip_x |
bool, optional | If True, mirrors the tissue horizontally. Default is False. |
flip_y |
bool, optional | If True, mirrors the tissue vertically. Default is False. |
spatial_key |
str, optional | Key in adata.obsm storing (x, y) coordinates. Default is 'spatial'. |
center |
bool, optional | If True, flips and rotates around the tissue center. If False, rotates around the (0,0) origin. Default is True. |
Returns: AnnData — The updated AnnData object.
plot_gene
plot_gene(sm_adata, gene_name, output_folder=None, figsize=(1,1), spot_size=25,
fontsize=6, fontfamily='sans-serif', cmap=None, dpi=300, file_format='png')
Plots the spatial expression of a specified gene in a single spatial transcriptomic dataset.
Parameters:
| Parameter | Type | Description |
|---|---|---|
sm_adata |
AnnData | AnnData object containing spatial transcriptomic data. |
gene_name |
str | Name of the gene to visualize. |
output_folder |
str, optional | Directory path where the output plot is saved. If None, displayed but not saved. Default is None. |
figsize |
tuple, optional | Figure size in inches (width, height). Default is (1, 1). |
spot_size |
int, optional | Size of spatial spots in the plot. Default is 25. |
fontsize |
int, optional | Font size for the plot title. Default is 6. |
fontfamily |
str, optional | Font family for the plot title. Default is 'sans-serif'. |
cmap |
Colormap, optional | Colormap for visualizing expression. If None, defaults to a custom gray-red-black colormap. |
dpi |
int, optional | Resolution of the saved plot in DPI. Default is 300. |
file_format |
str, optional | 'png' or 'pdf'. Default is 'png'. |
Returns: None.
plot_modules
plot_modules(sm_adata, node_label_df, output_folder=None, plots_per_row=5, min_genes=3,
figsize=None, spot_size=25, fontsize=6, fontfamily='sans-serif',
cmap=None, dpi=300, file_format='png')
Plots module scores spatially, arranging plots in rows with a fixed number of columns.
Parameters:
| Parameter | Type | Description |
|---|---|---|
sm_adata |
AnnData | AnnData object containing spatial transcriptomic data. |
node_label_df |
pd.DataFrame | Gene module assignments with columns 'module_label' and 'name'. |
output_folder |
str, optional | Directory path where output plots are saved. If None, displayed but not saved. Default is None. |
plots_per_row |
int, optional | Number of plots per row in each output file. Default is 5. |
min_genes |
int, optional | Minimum number of genes required for a module to be plotted. Default is 3. |
figsize |
tuple, optional | Figure size in inches. Default is (plots_per_row, 1). |
spot_size |
int, optional | Size of spatial spots. Default is 25. |
fontsize |
int, optional | Font size for subplot titles. Default is 6. |
fontfamily |
str, optional | Font family for plot titles. Default is 'sans-serif'. |
cmap |
Colormap, optional | Colormap for visualizing expression. If None, defaults to a custom gray-red-black colormap. |
dpi |
int, optional | Resolution of saved plots in DPI. Default is 300. |
file_format |
str, optional | 'png' or 'pdf'. Default is 'png'. |
Returns: None.
plot_gene_multisample
plot_gene_multisample(sm_adata_list, adata_list_names, gene_name, output_folder=None,
shared_scaling=False, figsize=None, spot_size=25, fontsize=6,
fontfamily='sans-serif', cmap=None, dpi=300, file_format='png')
Plots the spatial expression of a specified gene across multiple spatial transcriptomic datasets.
Parameters:
| Parameter | Type | Description |
|---|---|---|
sm_adata_list |
list of AnnData | AnnData objects for each dataset. |
adata_list_names |
list of str | Dataset names for labeling subplots. |
gene_name |
str | Name of the gene to visualize. |
output_folder |
str, optional | Directory where output plots are saved. If None, displayed but not saved. Default is None. |
shared_scaling |
bool, optional | If True, normalizes using the maximum expression value across all datasets. If False, normalizes within each dataset independently. Default is False. |
figsize |
tuple, optional | Figure size in inches. Default is (n_datasets, 1). |
spot_size |
int, optional | Size of spatial spots. Default is 25. |
fontsize |
int, optional | Font size for subplot titles. Default is 6. |
fontfamily |
str, optional | Font family for text annotations. Default is 'sans-serif'. |
cmap |
Colormap, optional | Colormap for visualizing expression. If None, defaults to a custom gray-red-black colormap. |
dpi |
int, optional | Resolution of saved plots in DPI. Default is 300. |
file_format |
str, optional | 'png' or 'pdf'. Default is 'png'. |
Returns: None.
plot_modules_multisample
plot_modules_multisample(sm_adata_list, adata_list_names, node_label_df, output_folder=None,
shared_scaling=False, min_genes=3, figsize=None, spot_size=25,
fontsize=6, fontfamily='sans-serif', cmap=None, dpi=300, file_format='png')
Plots gene module scores across multiple spatial transcriptomic datasets, with an option for shared scaling.
Parameters:
| Parameter | Type | Description |
|---|---|---|
sm_adata_list |
list of AnnData | AnnData objects for each dataset. |
adata_list_names |
list of str | Dataset names for labeling subplots. |
node_label_df |
pd.DataFrame | Gene module assignments with columns 'module_label' and 'name'. |
output_folder |
str, optional | Directory where output plots are saved. If None, displayed but not saved. Default is None. |
shared_scaling |
bool, optional | If True, normalizes using the maximum expression value across all datasets. If False, normalizes within each dataset independently. Default is False. |
min_genes |
int, optional | Minimum number of genes required for a module to be plotted. Default is 3. |
figsize |
tuple, optional | Figure size in inches. Default is (n_datasets, 1). |
spot_size |
int, optional | Size of spatial spots. Default is 25. |
fontsize |
int, optional | Font size for subplot titles. Default is 6. |
fontfamily |
str, optional | Font family for text annotations. Default is 'sans-serif'. |
cmap |
Colormap, optional | Colormap for visualizing expression. If None, defaults to a custom gray-red-black colormap. |
dpi |
int, optional | Resolution of saved plots in DPI. Default is 300. |
file_format |
str, optional | 'png' or 'pdf'. Default is 'png'. |
Returns: None.
create_anndata_from_transcripts.py
create_anndata_from_transcripts
create_anndata_from_transcripts(input_file, x_col, y_col, gene_col, output_file,
file_format='csv', min_counts_per_gene=1,
count_col=None, chunksize=1000000)
Converts submicron-level spatial data (MERFISH, Xenium, CosMx, seqFISH, Stereo-seq bin1) to AnnData format. Each unique coordinate becomes a separate spot, preserving coordinates in their original units.
Parameters:
| Parameter | Type | Description |
|---|---|---|
input_file |
str | Path to input file (CSV, Parquet, TSV, or TSV.GZ). |
x_col |
str | Column name for x coordinates (e.g., 'x_location', 'global_x', 'x'). |
y_col |
str | Column name for y coordinates (e.g., 'y_location', 'global_y', 'y'). |
gene_col |
str | Column name for gene names (e.g., 'feature_name', 'gene', 'target', 'geneID'). |
output_file |
str | Path to save output .h5ad file. If None, does not save to disk. |
file_format |
str, optional | Input file format: 'csv', 'parquet', 'tsv', or 'tsv.gz'. Default is 'csv'. |
min_counts_per_gene |
int, optional | Minimum number of transcripts per gene to include in output. Default is 1. |
count_col |
str, optional | Column name for pre-aggregated counts (e.g., 'MIDCounts'). If None, assumes one transcript per row. |
chunksize |
int, optional | Number of rows to read at a time for TSV/TSV.GZ files. Default is 1,000,000. |
Returns: AnnData with:
X: Sparse CSR matrix of gene counts (spots × genes)obs: Spot metadata with'total_counts'var: Gene metadata with'total_counts'obsm['spatial']: Spatial coordinates(n_spots × 2)
Example Usage:
Command line recommended — run in a tmux screen or equivalent for long-running conversions.
Command Line:
Xenium (10x Genomics):
python -m smoothie.create_anndata_from_transcripts \
transcripts.parquet \
--x-col x_location \
--y-col y_location \
--gene-col feature_name \
--format parquet \
--output xenium_data.h5ad
Stereo-seq (MGI/BGI):
python -m smoothie.create_anndata_from_transcripts \
E9.5_E1S1_GEM_bin1.tsv.gz \
--x-col x \
--y-col y \
--gene-col geneID \
--count-col MIDCounts \
--format tsv.gz \
--chunksize 1000000 \
--output stereoseq_data.h5ad
MERFISH (Vizgen):
python -m smoothie.create_anndata_from_transcripts \
detected_transcripts.csv \
--x-col global_x \
--y-col global_y \
--gene-col gene \
--format csv \
--output merfish_data.h5ad
CosMx (NanoString):
python -m smoothie.create_anndata_from_transcripts \
transcripts.csv \
--x-col x_global_px \
--y-col y_global_px \
--gene-col target \
--format csv \
--output cosmx_data.h5ad
seqFISH/seqFISH+:
python -m smoothie.create_anndata_from_transcripts \
transcripts.csv \
--x-col x \
--y-col y \
--gene-col gene \
--format csv \
--output seqfish_data.h5ad
Python:
Xenium (10x Genomics):
adata = smoothie.create_anndata_from_transcripts(
'transcripts.parquet',
x_col='x_location',
y_col='y_location',
gene_col='feature_name',
file_format='parquet',
output_file='xenium_data.h5ad'
)
Stereo-seq (MGI/BGI):
adata = smoothie.create_anndata_from_transcripts(
'E9.5_E1S1_GEM_bin1.tsv.gz',
x_col='x',
y_col='y',
gene_col='geneID',
count_col='MIDCounts',
file_format='tsv.gz',
output_file='stereoseq_data.h5ad'
)
MERFISH (Vizgen):
adata = smoothie.create_anndata_from_transcripts(
'detected_transcripts.csv',
x_col='global_x',
y_col='global_y',
gene_col='gene',
file_format='csv',
output_file='merfish_data.h5ad'
)
CosMx (NanoString):
adata = smoothie.create_anndata_from_transcripts(
'transcripts.csv',
x_col='x_global_px',
y_col='y_global_px',
gene_col='target',
file_format='csv',
output_file='cosmx_data.h5ad'
)
seqFISH/seqFISH+:
adata = smoothie.create_anndata_from_transcripts(
'transcripts.csv',
x_col='x',
y_col='y',
gene_col='gene',
file_format='csv',
output_file='seqfish_data.h5ad'
)
shuffle_analysis.py
compute_shuffled_correlation_percentiles
compute_shuffled_correlation_percentiles(adata, grid_based_or_not, gaussian_sd=20,
stride=None, grid_fitting_dist=None,
min_spots_under_gaussian=25, num_processes=4,
num_data_splits=None, seed=None)
Shuffles spatial coordinates, smoothes data, and computes correlation matrix percentiles for null distribution estimation.
Parameters:
| Parameter | Type | Description |
|---|---|---|
adata |
AnnData or list of AnnData | Single AnnData object or list of AnnData objects. For multiple datasets, genes are aligned. |
grid_based_or_not |
bool | True = bin shuffling + grid-based smoothing (for subcellular resolution data); False = in-place shuffling + in-place smoothing (for cellular resolution data). |
gaussian_sd |
float | Standard deviation of Gaussian kernel in coordinate units. |
min_spots_under_gaussian |
int | Minimum spots required within 3 * gaussian_sd radius for valid smoothing at a given location. |
stride |
float, optional | Grid spacing for smoothing (grid-based only). Default is gaussian_sd. |
grid_fitting_dist |
float, optional | Minimum distance from grid point to data for grid fitting (grid-based only). Default is 0.25 × gaussian_sd. |
num_processes |
int, optional | Number of parallel processes for smoothing. Default is 4. |
num_data_splits |
int, optional | Number of data splits for parallel processing. If None, automatically determined. |
seed |
int, optional | Random seed for reproducibility. Default is None. |
Returns: (p95, p99, p999) — The 95th, 99th, and 99.9th percentiles of the shuffled correlation coefficient distribution. Use these as pcc_cutoff inputs to select_clustering_params or make_spatial_network.
Note: Call this function with the same parameters used for
run_parallelized_smoothing.
utils.py
suppress_warnings
suppress_warnings()
Suppresses specific warnings that Smoothie commonly triggers due to memory-efficient operations, including AnnData ImplicitModificationWarning and scanpy view warnings. Called automatically on import.
Returns: None.
enable_warnings
enable_warnings()
Re-enables all warnings that were suppressed by resetting all warning filters to default.
Returns: None.
quiet_mode
quiet_mode()
Context manager for temporarily suppressing Smoothie warnings in a specific code block. Warnings are automatically restored after the block exits.
Returns: Context manager.
Example:
with smoothie.quiet_mode():
sm_adata = smoothie.run_parallelized_smoothing(...)
# warnings restored after the block
