Smoothie API Reference

gaussian_smoothing.py

`run_parallelized_smoothing`

run_parallelized_smoothing(adata, grid_based_or_not, gaussian_sd, min_spots_under_gaussian,
                           stride=None, grid_fitting_dist=None, num_processes=10, num_data_splits=None)

Performs parallelized Gaussian smoothing on spatial transcriptomics data.

Parameters:

Parameter	Type	Description
`adata`	AnnData	The input dataset.
`grid_based_or_not`	bool	`True` = grid-based smoothing (smooth only at hexagonal grid points); `False` = in-place smoothing (smooth at every spatial location). In-place is effective for cell-sized or larger resolution (10–50 µm). Grid-based is effective for subcellular to cell-sized resolution (0.5–10 µm).
`gaussian_sd`	float	Standard deviation for the Gaussian kernel in coordinate units. Use `target_microns × micron_to_unit_conversion`. Target of 20–40 µm is generally ideal. See `docs/guides/micron_to_unit_conversion_table.md`.
`min_spots_under_gaussian`	int	Minimum number of data points within radius `3 × gaussian_sd` required for smoothing to occur at a location (typical range: 25–100).
`stride`	float, optional	Stride value for grid-based smoothing. Default is `1 × gaussian_sd`; `0.5 × gaussian_sd` allows a denser grid.
`grid_fitting_dist`	float, optional	Minimum distance a grid point must be from a spatial coordinate to be retained. Default is `0.25 × gaussian_sd`.
`num_processes`	int, optional	Number of parallel processes to use. Default is 10.
`num_data_splits`	int, optional	Number of data chunks for memory-efficient processing. If unspecified, automatically selected.

Returns: sm_adata (AnnData) — The smoothed AnnData object.

spatial_correlation.py

`compute_correlation_matrix`

compute_correlation_matrix(X)

Computes pairwise Pearson correlation coefficients between all genes.

Parameters:

Parameter	Type	Description
`X`	np.ndarray	An `(N × G)` matrix where rows are spatial points and columns are genes. Should be smoothed gene expression data.

Returns: (pearsonR_mat, p_val_mat) — Tuple of two (G × G) matrices: Pearson correlation coefficients and corresponding p-values.

`get_correlations_to_GOI`

get_correlations_to_GOI(pearsonR_mat, gene_names, GOI, reverse_order=False, plot_histogram=True)

Retrieves and ranks the correlation of all genes with a specified gene of interest (GOI).

Parameters:

Parameter	Type	Description
`pearsonR_mat`	np.ndarray	A `(G × G)` Pearson correlation coefficient matrix.
`gene_names`	list, np.ndarray, or pd.Index	Gene names corresponding to matrix indices.
`GOI`	str	The gene of interest for which correlations are ranked.
`reverse_order`	bool, optional	If `True`, sorts correlations in ascending order. Default is `False` (descending).
`plot_histogram`	bool, optional	If `True`, plots a histogram of correlation values to GOI. Default is `True`.

Returns: (G × 2) np.ndarray — First column contains gene names, second column contains correlation values with the GOI, sorted by correlation strength.

network_analysis.py

`make_spatial_network`

make_spatial_network(pearsonR_mat, gene_names, pcc_cutoff, clustering_power,
                     output_folder=None, save_file_prefix="", gene_labels_list=None,
                     gene_labels_names=None, trials=20, random_seed=0)

Generates a spatial co-expression network with hard thresholding (pcc_cutoff) and soft power transformation (clustering_power), then performs Infomap clustering and calculates network metrics.

Parameters:

Parameter	Type	Description
`pearsonR_mat`	np.ndarray	Square matrix of Pearson correlation coefficients between genes.
`gene_names`	list or pd.Index	Gene names corresponding to rows/columns of the correlation matrix.
`pcc_cutoff`	float	Threshold for the Pearson correlation coefficient; only correlations above this value are retained.
`clustering_power`	float	Soft power that controls rescaling of PCC values. Higher values favor more gene modules.
`output_folder`	str, optional	Directory where output files are saved. If `None`, no files are saved. Default is `None`.
`save_file_prefix`	str, optional	String prefix added to output filenames.
`gene_labels_list`	list of list/tuple/np.ndarray, optional	Gene set labels for visualization. Each item must have the same length as the number of genes.
`gene_labels_names`	list, optional	Names for the gene sets in `gene_labels_list`.
`trials`	int, optional	Number of Infomap clustering trials. Default is 20.
`random_seed`	int, optional	Random seed for the Infomap clustering algorithm. Default is 0.

Returns: (edge_list, node_label_df) — Edge list in the format [gene1, gene2, PCC, Rescaled_PCC], and a DataFrame with gene names, module labels, and network metrics.

`make_geneset_spatial_network`

make_geneset_spatial_network(pearsonR_mat, gene_names, node_label_df, gene_list,
                              low_pcc_cutoff, output_folder=None, intra_geneset_edges_only=False)

Given a clustered network, constructs a new network with a lower PCC cutoff applied to a provided gene set.

Parameters:

Parameter	Type	Description
`pearsonR_mat`	np.ndarray	Pearson correlation coefficient matrix.
`gene_names`	list	Gene names corresponding to rows/columns of `pearsonR_mat`.
`node_label_df`	pd.DataFrame	DataFrame with gene names and module labels from `make_spatial_network`.
`gene_list`	list	Genes to retain in the subset network.
`low_pcc_cutoff`	float	Minimum PCC required to include an edge. Must be ≤ the `pcc_cutoff` used to generate `node_label_df`.
`output_folder`	str, optional	Directory where output files are saved. If `None`, no files are saved. Default is `None`.
`intra_geneset_edges_only`	bool, optional	If `True`, only includes edges where both nodes are in `gene_list`. Default is `False`.

Returns: (geneset_edge_list, geneset_node_label_df) — Edge list in format [gene1, gene2, PCC], and a DataFrame with updated module labels including column weak_module_label for gene_list genes that have a correlation above low_pcc_cutoff with another gene in the previously clustered network (node_label_df modules).

multi_sample_integration.py

`concatenate_smoothed_matrices`

concatenate_smoothed_matrices(sm_adata_list)

Concatenates smoothed count matrices from multiple datasets, aligning gene columns across datasets.

Parameters:

Parameter	Type	Description
`sm_adata_list`	list of AnnData	AnnData objects with smoothed count matrices.

Returns: (X_concat, gene_names) — Concatenated (N × G) matrix with aligned genes, and the corresponding list of gene names.

`run_second_order_correlation_analysis`

run_second_order_correlation_analysis(sm_adata_list, pcc_cutoff, node_label_df=None,
                                      E_max=25, output_folder=None, seed=0)

Comprehensive function for second-order correlation gene embedding analysis across multiple samples. Workflow: (1) aligns correlation matrices, (2) creates gene embeddings, (3) computes stability metrics, and (4) adds stability values to node_label_df. Check this function's Returns section for information on output.

Parameters:

Parameter	Type	Description
`sm_adata_list`	list of AnnData	Smoothed AnnData objects from different spatial transcriptomics samples. Each must have gene expression data in `.X` and gene names in `.var_names`.
`pcc_cutoff`	float	Minimum PCC threshold. Genes must exceed this value in at least one dataset to be included in embedding analysis. Typical values: 0.4–0.5.
`node_label_df`	pd.DataFrame, optional	Gene module assignments with columns `'name'` and `'module_label'`. If provided, embedding features are balanced across modules and stability metrics are added as a `'gene_stabilities'` column. Default is `None`.
`E_max`	int, optional	Maximum number of genes per module used for embedding features. Controls embedding dimensionality and balances representation across modules. Typical values: 15–50. Default is 25.
`output_folder`	str, optional	Path to folder where results are saved. If `None`, files are not saved. Default is `None`.
`seed`	int, optional	Random seed for reproducibility when downsampling embedding features. Default is 0.

Returns: dict with keys:

'gene_embeddings_tensor' (np.ndarray): Shape (n_genes, n_features, n_datasets). Each gene's embedding is a vector of Fisher Z-transformed correlation values with other genes (features).
'robust_gene_names' (np.ndarray): Shape (n_genes). These are the genes that passed the pcc_cutoff correlation threshold in at least one dataset in sm_adata_list.
'gene_stabilities' (np.ndarray): Shape (n_datasets, n_datasets, n_genes) Tensor of gene stabilities across dataset pairs. Values range from -1 to 1, where higher values indicate more stable/conserved co-expression patterns. A stability of -1 designates that a gene was missing in one of the datasets.
'aligned_pearsonR_tensor' (np.ndarray): Shape (n_all_genes, n_all_genes, n_datasets). Contains aligned Pearson correlation matrices across all datasets.
'node_label_df' (pd.DataFrame or None): Input node_label_df with 'gene_stabilities' column added, or None.

`plot_dataset_pair_stabilities`

plot_dataset_pair_stabilities(gene_stabilities, dataset_names, output_folder=None,
                               figsize=None, fontsize=5, dpi=300, file_format='png')

Visualizes the distribution of gene stability between all dataset pairs using violin plots. See run_second_order_correlation_analysis for more info.

Parameters:

Parameter	Type	Description
`gene_stabilities`	np.ndarray	Stabilities tensor of shape `(n_datasets, n_datasets, n_genes)` from `run_second_order_correlation_analysis`.
`dataset_names`	list of str	Dataset names for axis labels.
`output_folder`	str, optional	Directory to save the plot. If `None`, does not save.
`figsize`	tuple, optional	Figure size in inches. If `None`, auto-calculated based on number of pairs.
`fontsize`	int, optional	Font size for labels and titles. Default is 5.
`dpi`	int, optional	Resolution for saved figure. Default is 300.
`file_format`	str, optional	`'png'` or `'pdf'`. Default is `'png'`.

Returns: None.

`plot_gene_stability_distribution`

plot_gene_stability_distribution(gene_stabilities, robust_gene_names, node_label_df=None,
                                  output_folder=None, figsize=(3,2), fontsize=6,
                                  bins=50, dpi=300, file_format='png')

Plots the distribution of per gene minimum stabilities across all dataset pairs and returns a sorted dataframe of minimum gene stabilities. See run_second_order_correlation_analysis for more info.

Parameters:

Parameter	Type	Description
`gene_stabilities`	np.ndarray	Stabilities tensor of shape `(n_datasets, n_datasets, n_genes)`.
`robust_gene_names`	np.ndarray	Gene names corresponding to the third dimension of `gene_stabilities`.
`node_label_df`	pd.DataFrame, optional	Node labels DataFrame with `'name'` and `'module_label'` columns. If provided, module labels are included in output.
`output_folder`	str, optional	Directory to save the plot. If `None`, does not save.
`figsize`	tuple, optional	Figure size in inches. Default is `(3, 2)`.
`fontsize`	int, optional	Font size for axes labels. Default is 6.
`bins`	int, optional	Number of histogram bins. Default is 50.
`dpi`	int, optional	Resolution for saved figure. Default is 300.
`file_format`	str, optional	`'png'` or `'pdf'`. Default is `'png'`.

Returns: pd.DataFrame with columns ['gene', 'module', 'min_stability'] sorted from most to least stable. If node_label_df is not provided, 'module' column will be NaN.

`plot_gene_stability`

plot_gene_stability(dataset_names, gene_name, gene_stabilities, robust_gene_names,
                    output_folder=None, figsize=(1.5, 1.5), fontsize=5,
                    dpi=300, file_format='png', x_ticks=True, cbar=True)

Plots a compact stability heatmap for a specific gene showing pairwise dataset correlations. See run_second_order_correlation_analysis for more info.

Parameters:

Parameter	Type	Description
`dataset_names`	list of str	Dataset names for axis labels.
`gene_name`	str	Name of the gene to visualize.
`gene_stabilities`	np.ndarray	Stabilities tensor of shape `(n_datasets, n_datasets, n_genes)`.
`robust_gene_names`	np.ndarray	Gene names corresponding to the third dimension of `gene_stabilities`.
`output_folder`	str, optional	Directory to save the plot. If `None`, does not save.
`figsize`	tuple, optional	Figure size in inches. Default is `(1.5, 1.5)`.
`fontsize`	int, optional	Font size for title and tick labels. Default is 5.
`dpi`	int, optional	Resolution for saved figure. Default is 300.
`file_format`	str, optional	`'png'` or `'pdf'`. Default is `'png'`.
`x_ticks`	bool, optional	Whether to display x-axis dataset labels. Default is `True`.
`cbar`	bool, optional	Whether to display a colorbar. Default is `True`.

Returns: None.

`plot_module_stability`

plot_module_stability(module_label, gene_stabilities, robust_gene_names, node_label_df,
                      dataset_names, output_folder=None, figsize=(1.5, 1.5), fontsize=5,
                      dpi=300, file_format='png', x_ticks=False, cbar=True)

Plots a compact stability heatmap for a module showing average pairwise dataset correlations across all genes in the module. See run_second_order_correlation_analysis for more info.

Parameters:

Parameter	Type	Description
`module_label`	int or str	Module identifier from `node_label_df`.
`gene_stabilities`	np.ndarray	Stabilities tensor of shape `(n_datasets, n_datasets, n_genes)`.
`robust_gene_names`	np.ndarray	Gene names corresponding to the third dimension of `gene_stabilities`.
`node_label_df`	pd.DataFrame	DataFrame with `'name'` and `'module_label'` columns.
`dataset_names`	list of str	Dataset names for axis labels.
`output_folder`	str, optional	Directory to save the plot. If `None`, does not save.
`figsize`	tuple, optional	Figure size in inches. Default is `(1.5, 1.5)`.
`fontsize`	int, optional	Font size for title and labels. Default is 5.
`dpi`	int, optional	Resolution for saved figure. Default is 300.
`file_format`	str, optional	`'png'` or `'pdf'`. Default is `'png'`.
`x_ticks`	bool, optional	Whether to display x-axis dataset labels. Default is `False`.
`cbar`	bool, optional	Whether to display a colorbar. Default is `True`.

Returns: None.

choosing_hyperparameters.py

`select_clustering_params`

select_clustering_params(gene_names, pearsonR_mat, permuted_pcc_999=None, output_folder=None,
                          pcc_cutoffs=[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
                          clustering_powers=[1, 3, 5, 7, 9], min_genes_for_module=3,
                          infomap_clustering_trials=1, full_metrics=False)

Evaluates combinations of PCC cutoff and clustering power hyperparameters for spatial network construction, generating diagnostic plots of clustering quality metrics to guide hyperparameter selection.

Parameters:

Parameter	Type	Description
`gene_names`	list, np.ndarray, or pd.Index	Gene names corresponding to rows/columns of `pearsonR_mat`.
`pearsonR_mat`	np.ndarray	Square matrix of Pearson correlation coefficients between genes.
`permuted_pcc_999`	float, optional	The 99.9th percentile PCC from a shuffled-data null distribution (output of `compute_shuffled_correlation_percentiles`). If provided, appended to `pcc_cutoffs` as an additional cutoff to evaluate. Default is `None`.
`output_folder`	str, optional	Directory where output plots and CSV results are saved. If `None`, plots are displayed but not saved. Default is `None`.
`pcc_cutoffs`	list of float, optional	PCC hard-threshold cutoff values to evaluate. Default is `[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]`.
`clustering_powers`	list of int or float, optional	Soft-power exponents to evaluate for rescaling PCC edge weights. Default is `[1, 3, 5, 7, 9]`.
`min_genes_for_module`	int, optional	Minimum number of genes required for a cluster to be counted as a module. Default is 3.
`infomap_clustering_trials`	int, optional	Number of Infomap algorithm trials per hyperparameter combination. More trials improve stability. Default is 1 for runtime efficiency.
`full_metrics`	bool, optional	If `True`, plots all metrics (`mean_gene_margin`, `fraction_margin_positive`, `modularity`, `n_clusters`, `n_genes_included`). If `False`, plots only `mean_gene_margin`, `n_clusters`, and `n_genes_included`. Default is `False`.

Returns: None. Displays diagnostic plots and optionally saves them along with a results CSV to output_folder.

plotting.py

`rotate_spatial`

rotate_spatial(adata, angle_degrees=0, flip_x=False, flip_y=False,
               spatial_key='spatial', center=True)

Flips and rotates the spatial coordinates in an AnnData object.

Parameters:

Parameter	Type	Description
`adata`	AnnData	The AnnData object to modify.
`angle_degrees`	float, optional	Angle to rotate in degrees (counter-clockwise). Default is 0.
`flip_x`	bool, optional	If `True`, mirrors the tissue horizontally. Default is `False`.
`flip_y`	bool, optional	If `True`, mirrors the tissue vertically. Default is `False`.
`spatial_key`	str, optional	Key in `adata.obsm` storing `(x, y)` coordinates. Default is `'spatial'`.
`center`	bool, optional	If `True`, flips and rotates around the tissue center. If `False`, rotates around the `(0,0)` origin. Default is `True`.

Returns: AnnData — The updated AnnData object.

`plot_gene`

plot_gene(sm_adata, gene_name, output_folder=None, figsize=(1,1), spot_size=25,
          fontsize=6, fontfamily='sans-serif', cmap=None, dpi=300, file_format='png')

Plots the spatial expression of a specified gene in a single spatial transcriptomic dataset.

Parameters:

Parameter	Type	Description
`sm_adata`	AnnData	AnnData object containing spatial transcriptomic data.
`gene_name`	str	Name of the gene to visualize.
`output_folder`	str, optional	Directory path where the output plot is saved. If `None`, displayed but not saved. Default is `None`.
`figsize`	tuple, optional	Figure size in inches `(width, height)`. Default is `(1, 1)`.
`spot_size`	int, optional	Size of spatial spots in the plot. Default is 25.
`fontsize`	int, optional	Font size for the plot title. Default is 6.
`fontfamily`	str, optional	Font family for the plot title. Default is `'sans-serif'`.
`cmap`	Colormap, optional	Colormap for visualizing expression. If `None`, defaults to a custom gray-red-black colormap.
`dpi`	int, optional	Resolution of the saved plot in DPI. Default is 300.
`file_format`	str, optional	`'png'` or `'pdf'`. Default is `'png'`.

Returns: None.

`plot_modules`

plot_modules(sm_adata, node_label_df, output_folder=None, plots_per_row=5, min_genes=3,
             figsize=None, spot_size=25, fontsize=6, fontfamily='sans-serif',
             cmap=None, dpi=300, file_format='png')

Plots module scores spatially, arranging plots in rows with a fixed number of columns.

Parameters:

Parameter	Type	Description
`sm_adata`	AnnData	AnnData object containing spatial transcriptomic data.
`node_label_df`	pd.DataFrame	Gene module assignments with columns `'module_label'` and `'name'`.
`output_folder`	str, optional	Directory path where output plots are saved. If `None`, displayed but not saved. Default is `None`.
`plots_per_row`	int, optional	Number of plots per row in each output file. Default is 5.
`min_genes`	int, optional	Minimum number of genes required for a module to be plotted. Default is 3.
`figsize`	tuple, optional	Figure size in inches. Default is `(plots_per_row, 1)`.
`spot_size`	int, optional	Size of spatial spots. Default is 25.
`fontsize`	int, optional	Font size for subplot titles. Default is 6.
`fontfamily`	str, optional	Font family for plot titles. Default is `'sans-serif'`.
`cmap`	Colormap, optional	Colormap for visualizing expression. If `None`, defaults to a custom gray-red-black colormap.
`dpi`	int, optional	Resolution of saved plots in DPI. Default is 300.
`file_format`	str, optional	`'png'` or `'pdf'`. Default is `'png'`.

Returns: None.

`plot_gene_multisample`

plot_gene_multisample(sm_adata_list, adata_list_names, gene_name, output_folder=None,
                      shared_scaling=False, figsize=None, spot_size=25, fontsize=6,
                      fontfamily='sans-serif', cmap=None, dpi=300, file_format='png')

Plots the spatial expression of a specified gene across multiple spatial transcriptomic datasets.

Parameters:

Parameter	Type	Description
`sm_adata_list`	list of AnnData	AnnData objects for each dataset.
`adata_list_names`	list of str	Dataset names for labeling subplots.
`gene_name`	str	Name of the gene to visualize.
`output_folder`	str, optional	Directory where output plots are saved. If `None`, displayed but not saved. Default is `None`.
`shared_scaling`	bool, optional	If `True`, normalizes using the maximum expression value across all datasets. If `False`, normalizes within each dataset independently. Default is `False`.
`figsize`	tuple, optional	Figure size in inches. Default is `(n_datasets, 1)`.
`spot_size`	int, optional	Size of spatial spots. Default is 25.
`fontsize`	int, optional	Font size for subplot titles. Default is 6.
`fontfamily`	str, optional	Font family for text annotations. Default is `'sans-serif'`.
`cmap`	Colormap, optional	Colormap for visualizing expression. If `None`, defaults to a custom gray-red-black colormap.
`dpi`	int, optional	Resolution of saved plots in DPI. Default is 300.
`file_format`	str, optional	`'png'` or `'pdf'`. Default is `'png'`.

Returns: None.

`plot_modules_multisample`

plot_modules_multisample(sm_adata_list, adata_list_names, node_label_df, output_folder=None,
                          shared_scaling=False, min_genes=3, figsize=None, spot_size=25,
                          fontsize=6, fontfamily='sans-serif', cmap=None, dpi=300, file_format='png')

Plots gene module scores across multiple spatial transcriptomic datasets, with an option for shared scaling.

Parameters:

Parameter	Type	Description
`sm_adata_list`	list of AnnData	AnnData objects for each dataset.
`adata_list_names`	list of str	Dataset names for labeling subplots.
`node_label_df`	pd.DataFrame	Gene module assignments with columns `'module_label'` and `'name'`.
`output_folder`	str, optional	Directory where output plots are saved. If `None`, displayed but not saved. Default is `None`.
`shared_scaling`	bool, optional	If `True`, normalizes using the maximum expression value across all datasets. If `False`, normalizes within each dataset independently. Default is `False`.
`min_genes`	int, optional	Minimum number of genes required for a module to be plotted. Default is 3.
`figsize`	tuple, optional	Figure size in inches. Default is `(n_datasets, 1)`.
`spot_size`	int, optional	Size of spatial spots. Default is 25.
`fontsize`	int, optional	Font size for subplot titles. Default is 6.
`fontfamily`	str, optional	Font family for text annotations. Default is `'sans-serif'`.
`cmap`	Colormap, optional	Colormap for visualizing expression. If `None`, defaults to a custom gray-red-black colormap.
`dpi`	int, optional	Resolution of saved plots in DPI. Default is 300.
`file_format`	str, optional	`'png'` or `'pdf'`. Default is `'png'`.

Returns: None.

create_anndata_from_transcripts.py

`create_anndata_from_transcripts`

create_anndata_from_transcripts(input_file, x_col, y_col, gene_col, output_file,
                                 file_format='csv', min_counts_per_gene=1,
                                 count_col=None, chunksize=1000000)

Converts submicron-level spatial data (MERFISH, Xenium, CosMx, seqFISH, Stereo-seq bin1) to AnnData format. Each unique coordinate becomes a separate spot, preserving coordinates in their original units.

Parameters:

Parameter	Type	Description
`input_file`	str	Path to input file (CSV, Parquet, TSV, or TSV.GZ).
`x_col`	str	Column name for x coordinates (e.g., `'x_location'`, `'global_x'`, `'x'`).
`y_col`	str	Column name for y coordinates (e.g., `'y_location'`, `'global_y'`, `'y'`).
`gene_col`	str	Column name for gene names (e.g., `'feature_name'`, `'gene'`, `'target'`, `'geneID'`).
`output_file`	str	Path to save output `.h5ad` file. If `None`, does not save to disk.
`file_format`	str, optional	Input file format: `'csv'`, `'parquet'`, `'tsv'`, or `'tsv.gz'`. Default is `'csv'`.
`min_counts_per_gene`	int, optional	Minimum number of transcripts per gene to include in output. Default is 1.
`count_col`	str, optional	Column name for pre-aggregated counts (e.g., `'MIDCounts'`). If `None`, assumes one transcript per row.
`chunksize`	int, optional	Number of rows to read at a time for TSV/TSV.GZ files. Default is 1,000,000.

Returns: AnnData with:

X: Sparse CSR matrix of gene counts (spots × genes)
obs: Spot metadata with 'total_counts'
var: Gene metadata with 'total_counts'
obsm['spatial']: Spatial coordinates (n_spots × 2)

Example Usage:

Command line recommended — run in a tmux screen or equivalent for long-running conversions.

Command Line:

Xenium (10x Genomics):

python -m smoothie.create_anndata_from_transcripts \
    transcripts.parquet \
    --x-col x_location \
    --y-col y_location \
    --gene-col feature_name \
    --format parquet \
    --output xenium_data.h5ad

Stereo-seq (MGI/BGI):

python -m smoothie.create_anndata_from_transcripts \
    E9.5_E1S1_GEM_bin1.tsv.gz \
    --x-col x \
    --y-col y \
    --gene-col geneID \
    --count-col MIDCounts \
    --format tsv.gz \
    --chunksize 1000000 \
    --output stereoseq_data.h5ad

MERFISH (Vizgen):

python -m smoothie.create_anndata_from_transcripts \
    detected_transcripts.csv \
    --x-col global_x \
    --y-col global_y \
    --gene-col gene \
    --format csv \
    --output merfish_data.h5ad

CosMx (NanoString):

python -m smoothie.create_anndata_from_transcripts \
    transcripts.csv \
    --x-col x_global_px \
    --y-col y_global_px \
    --gene-col target \
    --format csv \
    --output cosmx_data.h5ad

seqFISH/seqFISH+:

python -m smoothie.create_anndata_from_transcripts \
    transcripts.csv \
    --x-col x \
    --y-col y \
    --gene-col gene \
    --format csv \
    --output seqfish_data.h5ad

Python:

Xenium (10x Genomics):

adata = smoothie.create_anndata_from_transcripts(
    'transcripts.parquet',
    x_col='x_location',
    y_col='y_location',
    gene_col='feature_name',
    file_format='parquet',
    output_file='xenium_data.h5ad'
)

Stereo-seq (MGI/BGI):

adata = smoothie.create_anndata_from_transcripts(
    'E9.5_E1S1_GEM_bin1.tsv.gz',
    x_col='x',
    y_col='y',
    gene_col='geneID',
    count_col='MIDCounts',
    file_format='tsv.gz',
    output_file='stereoseq_data.h5ad'
)

MERFISH (Vizgen):

adata = smoothie.create_anndata_from_transcripts(
    'detected_transcripts.csv',
    x_col='global_x',
    y_col='global_y',
    gene_col='gene',
    file_format='csv',
    output_file='merfish_data.h5ad'
)

CosMx (NanoString):

adata = smoothie.create_anndata_from_transcripts(
    'transcripts.csv',
    x_col='x_global_px',
    y_col='y_global_px',
    gene_col='target',
    file_format='csv',
    output_file='cosmx_data.h5ad'
)

seqFISH/seqFISH+:

adata = smoothie.create_anndata_from_transcripts(
    'transcripts.csv',
    x_col='x',
    y_col='y',
    gene_col='gene',
    file_format='csv',
    output_file='seqfish_data.h5ad'
)

shuffle_analysis.py

`compute_shuffled_correlation_percentiles`

compute_shuffled_correlation_percentiles(adata, grid_based_or_not, gaussian_sd=20,
                                          stride=None, grid_fitting_dist=None,
                                          min_spots_under_gaussian=25, num_processes=4,
                                          num_data_splits=None, seed=None)

Shuffles spatial coordinates, smoothes data, and computes correlation matrix percentiles for null distribution estimation.

Parameters:

Parameter	Type	Description
`adata`	AnnData or list of AnnData	Single AnnData object or list of AnnData objects. For multiple datasets, genes are aligned.
`grid_based_or_not`	bool	`True` = bin shuffling + grid-based smoothing (for subcellular resolution data); `False` = in-place shuffling + in-place smoothing (for cellular resolution data).
`gaussian_sd`	float	Standard deviation of Gaussian kernel in coordinate units.
`min_spots_under_gaussian`	int	Minimum spots required within `3 * gaussian_sd` radius for valid smoothing at a given location.
`stride`	float, optional	Grid spacing for smoothing (grid-based only). Default is `gaussian_sd`.
`grid_fitting_dist`	float, optional	Minimum distance from grid point to data for grid fitting (grid-based only). Default is `0.25 × gaussian_sd`.
`num_processes`	int, optional	Number of parallel processes for smoothing. Default is 4.
`num_data_splits`	int, optional	Number of data splits for parallel processing. If `None`, automatically determined.
`seed`	int, optional	Random seed for reproducibility. Default is `None`.

Returns: (p95, p99, p999) — The 95th, 99th, and 99.9th percentiles of the shuffled correlation coefficient distribution. Use these as pcc_cutoff inputs to select_clustering_params or make_spatial_network.

Note: Call this function with the same parameters used for run_parallelized_smoothing.

utils.py

`suppress_warnings`

suppress_warnings()

Suppresses specific warnings that Smoothie commonly triggers due to memory-efficient operations, including AnnData ImplicitModificationWarning and scanpy view warnings. Called automatically on import.

Returns: None.

`enable_warnings`

enable_warnings()

Re-enables all warnings that were suppressed by resetting all warning filters to default.

Returns: None.

`quiet_mode`

quiet_mode()

Context manager for temporarily suppressing Smoothie warnings in a specific code block. Warnings are automatically restored after the block exits.

Returns: Context manager.

Example:

with smoothie.quiet_mode():
    sm_adata = smoothie.run_parallelized_smoothing(...)
# warnings restored after the block

surprise