# snap — Compute & Save Statistics

The `snap` subcommand reads result folders produced by `kmeans-model`, `ptep-model`, or `fgbuster-model`, computes statistics (residuals, power spectra, $r$ estimation), and saves them to lightweight `.parquet` files for later plotting.

## Basic Usage

```bash
r_analysis snap \
    -n 64 \
    -r "kmeans_BD10000_TD500_BS500_GAL020" \
    -ird results/ \
    -o snapshots/my_run.parquet
```

## Run Matching with `-r`

The `-r` flag controls which result folders are selected. It supports three matching modes depending on the pattern syntax.

### Token Matching (Exact)

When the pattern contains no regex metacharacters, it is split by `_` into tokens. A folder matches if **all tokens** are present in the folder name (AND logic).

```bash
# Match folders containing BOTH "kmeans" AND "BD10000"
r_analysis snap -r "kmeans_BD10000" -ird results/ -o out.parquet

# Match folders containing "kmeans", "BD4000", "TD500", "BS500", "GAL020"
r_analysis snap -r "BD4000_TD500_BS500_GAL020" -ird results/ -o out.parquet
```

The folder name is also split by `_`, and each token from the pattern must match at least one token in the folder name.

### Regex Matching (Expand)

When a token contains regex metacharacters (capture groups with `\d`, `\w`, etc.), each unique combination of captured values creates a **separate entry** in the output.

```bash
# Match all runs, extract BD/TD/BS/GAL values — each unique combination becomes a separate kw
r_analysis snap -r "BD(\d+)_TD(\d+)_BS(\d+)_GAL(\d+)" -ird results/ -o out.parquet
```

For example, if `results/` contains:
```
kmeans_c1d1s1_BD4000_TD500_BS500_..._GAL020_...
kmeans_c1d1s1_BD8000_TD500_BS500_..._GAL020_...
kmeans_c1d1s1_BD4000_TD500_BS500_..._GAL040_...
```

The pattern `BD(\d+)_TD(\d+)_BS(\d+)_GAL(\d+)` produces three separate entries:
- `BD4000_TD500_BS500_GAL020`
- `BD8000_TD500_BS500_GAL020`
- `BD4000_TD500_BS500_GAL040`

### Partial Matching (Merge Masks)

You can use a partial pattern to match folders across different mask configurations. All matched folders are merged into a single entry.

```bash
# Match all runs with BD4000_TD500_BS500 regardless of mask
# This effectively merges GAL020 + GAL040 + GAL060 masks → fsky ≈ 60%
r_analysis snap -r "BD4000_TD500_BS500" -ird results/ -o out.parquet
```

## Combining Runs

### `--combine`

Merge all matched result directories into a **single entry** rather than keeping them separate:

```bash
r_analysis snap -r "kmeans_BD4000" "ptep_BD64" \
    -ird results/ \
    --combine \
    --name "combined_run" \
    -o out.parquet
```

### `--name`

Set display names for each run group:

```bash
r_analysis snap -r "kmeans_BD4000" "ptep_BD64" \
    -ird results/ \
    --name "K-Means (4000)" "PTEP (64)" \
    -o out.parquet
```

by default, the display name is the matched pattern (e.g., `BD4000_TD500_BS500`).
For combined runs, it is recommended to give an explicit name so it easier to match when plotting using [plot](plot.md)

::::{seealso}
For reducing the number of clusters via post-clustering parameter binning, see [`bin`](bin.md).
::::

## All Arguments

| Flag | Type | Default | Description |
|---|---|---|---|
| `-o`, `--output-parquet` | `str` | *required* | Path to output `.parquet` file |
| `--noise-selection` | `str` | `min-value` | Which noise realization to use: `min-value`, `min-nll`, or an integer index |
| `--max-ns` | `int` | all | Maximum number of noise realizations |
| `--no-images` | flag | `False` | Skip rendering mollview images (faster) |
| `--combine` | flag | `False` | Merge all matched dirs into one entry |
| `--name` | `str` (list) | auto | Display names for run groups |
| `--max-size` | `int` | unlimited | Max entries per parquet file (splits into numbered files) |

Plus all [common arguments](index.md#common-arguments) (`-n`, `-r`, `-ird`, etc.).

## Output

The output is a `.parquet` file (powered by HuggingFace `datasets`) containing one row per matched run group. Each row stores:

- CMB reconstruction maps and patch assignments
- Power spectra ($C_\ell^{BB}$ observed, templates, residuals)
- $r$ estimation (best fit, confidence bounds, likelihood curve)
- Systematic and statistical residual maps
- Foreground parameter maps ($\beta_d$, $T_d$, $\beta_s$)
- Metadata (keyword, number of clusters, NLL, mask info)