snap — Compute & Save Statistics

The snap subcommand reads result folders produced by kmeans-model, ptep-model, or fgbuster-model, computes statistics (residuals, power spectra, \(r\) estimation), and saves them to lightweight .parquet files for later plotting.

Basic Usage

r_analysis snap \
    -n 64 \
    -r "kmeans_BD10000_TD500_BS500_GAL020" \
    -ird results/ \
    -o snapshots/my_run.parquet

Run Matching with -r

The -r flag controls which result folders are selected. It supports three matching modes depending on the pattern syntax.

Token Matching (Exact)

When the pattern contains no regex metacharacters, it is split by _ into tokens. A folder matches if all tokens are present in the folder name (AND logic).

# Match folders containing BOTH "kmeans" AND "BD10000"
r_analysis snap -r "kmeans_BD10000" -ird results/ -o out.parquet

# Match folders containing "kmeans", "BD4000", "TD500", "BS500", "GAL020"
r_analysis snap -r "BD4000_TD500_BS500_GAL020" -ird results/ -o out.parquet

The folder name is also split by _, and each token from the pattern must match at least one token in the folder name.

Regex Matching (Expand)

When a token contains regex metacharacters (capture groups with \d, \w, etc.), each unique combination of captured values creates a separate entry in the output.

# Match all runs, extract BD/TD/BS/GAL values — each unique combination becomes a separate kw
r_analysis snap -r "BD(\d+)_TD(\d+)_BS(\d+)_GAL(\d+)" -ird results/ -o out.parquet

For example, if results/ contains:

kmeans_c1d1s1_BD4000_TD500_BS500_..._GAL020_...
kmeans_c1d1s1_BD8000_TD500_BS500_..._GAL020_...
kmeans_c1d1s1_BD4000_TD500_BS500_..._GAL040_...

The pattern BD(\d+)_TD(\d+)_BS(\d+)_GAL(\d+) produces three separate entries:

  • BD4000_TD500_BS500_GAL020

  • BD8000_TD500_BS500_GAL020

  • BD4000_TD500_BS500_GAL040

Partial Matching (Merge Masks)

You can use a partial pattern to match folders across different mask configurations. All matched folders are merged into a single entry.

# Match all runs with BD4000_TD500_BS500 regardless of mask
# This effectively merges GAL020 + GAL040 + GAL060 masks → fsky ≈ 60%
r_analysis snap -r "BD4000_TD500_BS500" -ird results/ -o out.parquet

Combining Runs

--combine

Merge all matched result directories into a single entry rather than keeping them separate:

r_analysis snap -r "kmeans_BD4000" "ptep_BD64" \
    -ird results/ \
    --combine \
    --name "combined_run" \
    -o out.parquet

--name

Set display names for each run group:

r_analysis snap -r "kmeans_BD4000" "ptep_BD64" \
    -ird results/ \
    --name "K-Means (4000)" "PTEP (64)" \
    -o out.parquet

by default, the display name is the matched pattern (e.g., BD4000_TD500_BS500). For combined runs, it is recommended to give an explicit name so it easier to match when plotting using plot

See also

For reducing the number of clusters via post-clustering parameter binning, see bin.

All Arguments

Flag

Type

Default

Description

-o, --output-parquet

str

required

Path to output .parquet file

--noise-selection

str

min-value

Which noise realization to use: min-value, min-nll, or an integer index

--max-ns

int

all

Maximum number of noise realizations

--no-images

flag

False

Skip rendering mollview images (faster)

--combine

flag

False

Merge all matched dirs into one entry

--name

str (list)

auto

Display names for run groups

--max-size

int

unlimited

Max entries per parquet file (splits into numbered files)

Plus all common arguments (-n, -r, -ird, etc.).

Output

The output is a .parquet file (powered by HuggingFace datasets) containing one row per matched run group. Each row stores:

  • CMB reconstruction maps and patch assignments

  • Power spectra (\(C_\ell^{BB}\) observed, templates, residuals)

  • \(r\) estimation (best fit, confidence bounds, likelihood curve)

  • Systematic and statistical residual maps

  • Foreground parameter maps (\(\beta_d\), \(T_d\), \(\beta_s\))

  • Metadata (keyword, number of clusters, NLL, mask info)