Bin Pair Selectors: Filtering Two-Point Correlations

Version ?env:FIRECROWN_VERSION

Authors

Marc Paterno

Sandro Vitenti

Purpose of this Document

This tutorial explains how to use BinPairSelector objects to control which pairs of tomographic bins are included in two-point analyses. Selectors provide a powerful way to filter correlations based on bin names, measurement types, or custom criteria.

Why Use Bin Pair Selectors?

When working with two-point statistics, you often need to include only specific subsets of all possible bin pair combinations:

  • Auto-correlations only: Same bin correlated with itself
  • Cross-correlations only: Different bins correlated together
  • Source measurements only: Weak lensing (shear) correlations
  • Lens measurements only: Galaxy counts correlations
  • Specific bin pairs: Explicitly named combinations
  • Physical consistency: Avoid unphysical correlations (e.g., lens behind source)
  • Redshift constraints: Limit correlations based on tomographic bin separation
  • Custom criteria: Combine multiple conditions with logical operators

Bin pair selectors let you express these requirements declaratively, making your code clearer and more maintainable.

Basic Concepts

A BinPairSelector is applied when creating TwoPointXY combinations from InferredGalaxyZDist bins. It determines whether to keep or discard each potential pair.

The BinPairSelector.keep Method

Every selector implements a BinPairSelector.keep method that takes:

The method returns True to keep the pair, False to discard it.

Physical Relevance in Bin Selection

When selecting bin pairs for two-point analyses, you may need to ensure that the selected pairs are physically meaningful for your analysis. Not all mathematically possible bin combinations may correspond to useful correlations.

Galaxy-Galaxy Lensing: A Key Example

Consider galaxy-galaxy lensing, which measures the correlation between:

  • Source galaxies (background): Whose shapes are distorted by lensing
  • Lens galaxies (foreground): Whose gravitational field causes the lensing

For lensing to occur, the lens must be in front of the source (at lower or comparable redshift). How you ensure this depends on your data and naming conventions.

The Building Block Approach

Firecrown’s SourceLensBinPairSelector is a fundamental building block that selects all (Source, Lens) measurement pairs. By itself, it doesn’t enforce any redshift ordering, it simply selects based on measurement types.

If your bins follow a naming convention where indices correspond to increasing redshift (e.g., bin0 < bin1 < bin2 in redshift), you can combine SourceLensBinPairSelector with NameDiffBinPairSelector to select only pairs where the lens could plausibly be in front of the source.

Other approaches are possible: In the future, Firecrown may include selectors that directly inspect the dndz distributions of bins to determine redshift ordering, providing an alternative to naming-convention-based filtering.

Naming Conventions for Redshift Ordering

Many surveys use a convention where tomographic bins are numbered in order of increasing redshift:

  • lsst_y1_source0 contains sources at the lowest redshift
  • lsst_y1_source4 contains sources at the highest redshift
  • Similarly: lsst_y1_lens0 < lsst_y1_lens1 < … in redshift

If your bins follow this convention, NameDiffBinPairSelector provides a convenient way to filter by redshift proximity. If not, you may need different selectors or manual filtering.

Example: Combining Selectors for Redshift Constraints

Here’s how you can add redshift constraints if your bins follow the naming convention:

from firecrown.metadata_types import (
    SourceLensBinPairSelector,
    NameDiffBinPairSelector,
)
from firecrown.generators import (
    LSST_Y1_LENS_HARMONIC_BIN_COLLECTION,
    LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION,
)
from firecrown.metadata_functions import (
    make_binned_two_point_filtered,
    filter_two_point_combinations,
    make_all_photoz_bin_combinations,
)

# Generate bins (these follow the naming convention: higher index = higher redshift)
count_bins = LSST_Y1_LENS_HARMONIC_BIN_COLLECTION.generate()
shear_bins = LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION.generate()
all_bins = count_bins + shear_bins

# Basic approach: ALL source-lens pairs (measurement types only)
basic_selector = SourceLensBinPairSelector()
basic_pairs = make_binned_two_point_filtered(all_bins, basic_selector)

# Redshift-constrained approach: Add index-based filtering
# This works because LSST bins follow the naming convention (index ∝ redshift)
# neighbors_diff=[0, 1] means: source_index - lens_index ∈ {0, 1}
# i.e., lens at same or slightly lower redshift
redshift_constrained_selector = SourceLensBinPairSelector() & NameDiffBinPairSelector(
    same_name_prefix=False,  # Different prefixes (source vs lens)
    neighbors_diff=[0, 1],   # lens_index ∈ {source_index, source_index - 1}
)
redshift_constrained_pairs = make_binned_two_point_filtered(all_bins, redshift_constrained_selector)

print(f"Basic selection (measurement types): {len(basic_pairs)} pairs")
print(f"Redshift-constrained selection: {len(redshift_constrained_pairs)} pairs")
print(f"Filtered out {len(basic_pairs) - len(redshift_constrained_pairs)} pairs based on index")
Basic selection (measurement types): 25 pairs
Redshift-constrained selection: 9 pairs
Filtered out 16 pairs based on index

Here are the pairs kept by the redshift-constrained selector:

Code
import pandas as pd
from IPython.display import Markdown

# Show sample of redshift-constrained pairs
pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in redshift_constrained_pairs
]

df = pd.DataFrame(pairs_table)
display(Markdown(df.to_markdown(index=False)))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source1 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source1 lsst_y1_lens1 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source2 lsst_y1_lens1 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source2 lsst_y1_lens2 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens2 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens3 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens3 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens4 Galaxies.SHEAR_E Galaxies.COUNTS

The selector configuration:

print(f"Selector: {redshift_constrained_selector}")
print(f"\nComponents:")
print(f"  - SourceLensBinPairSelector: selects (Source, Lens) measurement pairs")
print(f"  - NameDiffBinPairSelector:")
print(f"      same_name_prefix=False  (different bin types allowed)")
print(f"      neighbors_diff=[0, 1]   (source_index - lens_index ∈ {{0, 1}})")
Selector: kind='and' pair_selectors=[SourceLensBinPairSelector(kind='source-lens'), NameDiffBinPairSelector(kind='name-diff', same_name_prefix=False, neighbors_diff=[0, 1])]

Components:
  - SourceLensBinPairSelector: selects (Source, Lens) measurement pairs
  - NameDiffBinPairSelector:
      same_name_prefix=False  (different bin types allowed)
      neighbors_diff=[0, 1]   (source_index - lens_index ∈ {0, 1})
Design Philosophy: Flexible Building Blocks

Firecrown provides atomic selector building blocks rather than prescribing one “correct” way to select bin pairs. This flexibility is essential because:

  • Different analyses have different requirements for which pairs to include
  • Physical constraints can be encoded in different ways (naming conventions, dndz inspection, manual lists, etc.)
  • Redshift overlap between bins varies by survey
  • Some analyses may intentionally include broader selections

SourceLensBinPairSelector is the fundamental building block, it selects (Source, Lens) measurement pairs. You can then combine it with other selectors to add constraints appropriate to your data:

When a selector pattern proves broadly useful (like 3×2pt with the naming convention), it’s added as a pre-built CompositeSelector.

Common Selector Types

Below are some common selectors you might use in your code. Note that we show only a fraction of their output for brevity.

Auto-Correlation Selectors

Select only pairs where both bins are the same:

from firecrown.metadata_types import AutoNameBinPairSelector
from firecrown.generators import (
    LSST_Y1_LENS_HARMONIC_BIN_COLLECTION,
    LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION,
)
from firecrown.metadata_functions import make_binned_two_point_filtered

# Generate LSST Y1 bins
count_bins = LSST_Y1_LENS_HARMONIC_BIN_COLLECTION.generate()
shear_bins = LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION.generate()
all_bins = count_bins + shear_bins

# Select only auto-correlations (same bin name)
auto_name_selector = AutoNameBinPairSelector()
auto_name_pairs = make_binned_two_point_filtered(all_bins, auto_name_selector)

print(f"Total bins: {len(all_bins)}")
print(f"Auto-correlation pairs (by name): {len(auto_name_pairs)}")
Total bins: 10
Auto-correlation pairs (by name): 10

If you also want matching measurement types only, use AutoMeasurementBinPairSelector. Note that this selector requires only the measurement types to match, not the bin names:

from firecrown.metadata_types import AutoMeasurementBinPairSelector

# Select pairs with same measurement type
auto_measurement_selector = AutoMeasurementBinPairSelector()
auto_measurement_pairs = make_binned_two_point_filtered(
    all_bins, auto_measurement_selector
)

print(f"Auto-correlation pairs (by measurement): {len(auto_measurement_pairs)}")
Auto-correlation pairs (by measurement): 30

Display the auto-correlation pairs:

Code
import pandas as pd
from IPython.display import Markdown

auto_measurement_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in auto_measurement_pairs[::3]  # Show every 3rd pair
]

df = pd.DataFrame(auto_measurement_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_source0 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source0 lsst_y1_source3 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source1 lsst_y1_source2 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source2 lsst_y1_source2 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source3 lsst_y1_source3 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_lens0 lsst_y1_lens0 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens0 lsst_y1_lens3 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens1 lsst_y1_lens2 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens2 lsst_y1_lens2 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens3 lsst_y1_lens3 Galaxies.COUNTS Galaxies.COUNTS

Cross-Correlation Selectors

Select only pairs where bins are different:

from firecrown.metadata_types import CrossNameBinPairSelector

# Select only cross-correlations (different bin names)
cross_selector = CrossNameBinPairSelector()
cross_pairs = make_binned_two_point_filtered(all_bins, cross_selector)

print(f"Cross-correlation pairs: {len(cross_pairs)}")
Cross-correlation pairs: 45

Display the cross-correlation pairs:

Code
import pandas as pd
from IPython.display import Markdown

cross_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in cross_pairs[::5]  # Show every 5th pair
]

df = pd.DataFrame(cross_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_source1 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source1 lsst_y1_source3 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source0 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source1 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source2 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_lens0 lsst_y1_lens1 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens1 lsst_y1_lens3 Galaxies.COUNTS Galaxies.COUNTS

Measurement Type Selectors

Select based on the type of measurement:

from firecrown.metadata_types import SourceBinPairSelector, LensBinPairSelector

# Source measurements (weak lensing shear)
source_selector = SourceBinPairSelector()
source_pairs = make_binned_two_point_filtered(all_bins, source_selector)

# Lens measurements (galaxy counts)
lens_selector = LensBinPairSelector()
lens_pairs = make_binned_two_point_filtered(all_bins, lens_selector)

print(f"Source (shear) pairs: {len(source_pairs)}")
print(f"Lens (counts) pairs: {len(lens_pairs)}")
Source (shear) pairs: 15
Lens (counts) pairs: 15

Source-lens cross-correlations:

from firecrown.metadata_types import SourceLensBinPairSelector

# Source × Lens cross-correlations
source_lens_selector = SourceLensBinPairSelector()
source_lens_pairs = make_binned_two_point_filtered(all_bins, source_lens_selector)

print(f"Source × Lens pairs: {len(source_lens_pairs)}")
Source × Lens pairs: 25

Display source-lens pairs:

Code
import pandas as pd
from IPython.display import Markdown

source_lens_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in source_lens_pairs[::5]  # Show every 5th pair
]

df = pd.DataFrame(source_lens_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source1 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source2 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS

Named Pair Selectors

Select specific bin name combinations explicitly:

from firecrown.metadata_types import NamedBinPairSelector

# Select specific pairs by name
named_selector = NamedBinPairSelector(
    names=[
        ("lsst_y1_lens0", "lsst_y1_lens1"),
        ("lsst_y1_source0", "lsst_y1_source0"),
        ("lsst_y1_source0", "lsst_y1_lens0"),
    ]
)
named_pairs = make_binned_two_point_filtered(all_bins, named_selector)

print(f"Named pairs: {len(named_pairs)}")
Named pairs: 3

Note: Matching is order-dependent. ("bin0", "bin1") is different from ("bin1", "bin0"). Include both if you want symmetric matching.

Display named pairs:

Code
import pandas as pd
from IPython.display import Markdown

named_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in named_pairs
]

df = pd.DataFrame(named_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_source0 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source0 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_lens0 lsst_y1_lens1 Galaxies.COUNTS Galaxies.COUNTS

Index-Based Selectors

Index-based selectors filter pairs based on the numeric indices in bin names and their text prefixes. These are relevant for implementing physical constraints based on the redshift ordering convention.

Understanding Bin Name Structure

Bin names follow the pattern <prefix><number>:

  • lsst_y1_source0 → prefix: lsst_y1_source, index: 0
  • lsst_y1_lens3 → prefix: lsst_y1_lens, index: 3

NameDiffBinPairSelector: Index-Based Filtering

This selector is useful when your bins follow a naming convention where the numeric index correlates with some property (typically redshift). It filters based on:

  1. Whether bin prefixes match (same_name_prefix)
  2. The difference between bin indices (neighbors_diff)

The index difference is computed as: left_index - right_index

from firecrown.metadata_types import NameDiffBinPairSelector

# Example 1: Same prefix, adjacent or identical bins
# Keeps bins differing by 0, ±1 in index
same_prefix_selector = NameDiffBinPairSelector(
    same_name_prefix=True,
    neighbors_diff=[0, 1, -1]
)

# Example 2: Different prefixes, lens at lower/equal redshift
# For galaxy-galaxy lensing: source_index - lens_index ∈ {0, 1}
diff_prefix_selector = NameDiffBinPairSelector(
    same_name_prefix=False,
    neighbors_diff=[0, 1]
)

# Apply to our bins
same_prefix_pairs = make_binned_two_point_filtered(all_bins, same_prefix_selector)
diff_prefix_pairs = make_binned_two_point_filtered(all_bins, diff_prefix_selector)

print(f"Same prefix, nearby indices: {len(same_prefix_pairs)} pairs")
print(f"Different prefix, controlled indices: {len(diff_prefix_pairs)} pairs")
Same prefix, nearby indices: 18 pairs
Different prefix, controlled indices: 9 pairs

Key behaviors:

Configuration Keeps Rejects Use Case
same_name_prefix=True
neighbors_diff=1
(bin0, bin1)
(bin1, bin0)
(bin0, bin2)
(bin2, bin2)
(src0, bin0)
Adjacent bins, same type
same_name_prefix=False
neighbors_diff=[0, 1]
(src0, lens0)
(src1, lens0)
(src1, lens1)
(src0, lens2)
(lens1, src0)
Cross-type, redshift constraint
same_name_prefix=True
neighbors_diff=0
(bin0, bin0)
(bin1, bin1)
(bin0, bin1) Exact auto-correlations

AutoNameDiffBinPairSelector

Convenience selector for same_name_prefix=True:

from firecrown.metadata_types import AutoNameDiffBinPairSelector

# Equivalent to NameDiffBinPairSelector(same_name_prefix=True, neighbors_diff=[0, 1, -1])
auto_neighbor_selector = AutoNameDiffBinPairSelector(neighbors_diff=[0, 1, -1])
auto_neighbor_pairs = make_binned_two_point_filtered(all_bins, auto_neighbor_selector)

print(f"Auto with neighbors: {len(auto_neighbor_pairs)} pairs")
Auto with neighbors: 18 pairs

Use case: When bins follow a naming convention with increasing redshift, this selects correlations between nearby redshift bins of the same type (e.g., source bins or lens bins).

CrossNameDiffBinPairSelector

Convenience selector for same_name_prefix=False:

from firecrown.metadata_types import CrossNameDiffBinPairSelector

# Equivalent to NameDiffBinPairSelector(same_name_prefix=False, neighbors_diff=[0, 1])
cross_neighbor_selector = CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
cross_neighbor_pairs = make_binned_two_point_filtered(all_bins, cross_neighbor_selector)

print(f"Cross-type with neighbors: {len(cross_neighbor_pairs)} pairs")
Cross-type with neighbors: 9 pairs

Use case: When bins follow a naming convention with increasing redshift, this selects cross-type correlations (e.g., source-lens) with redshift constraints based on index differences.

Display some cross-type neighbor pairs:

Code
import pandas as pd
from IPython.display import Markdown

cross_neighbor_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "index-diff": int(pair.x.bin_name[-1]) - int(pair.y.bin_name[-1]),
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in cross_neighbor_pairs[::5]  # Every 5th pair
]

df = pd.DataFrame(cross_neighbor_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y index-diff measurement-x measurement-y
lsst_y1_source0 lsst_y1_lens0 0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens2 1 Galaxies.SHEAR_E Galaxies.COUNTS

Logical Combinations

Selectors support logical operations to build complex criteria:

AND Operator (&)

Keep pairs that satisfy both conditions:

# Auto-correlations that are also source measurements
auto_source_selector = AutoNameBinPairSelector() & SourceBinPairSelector()
auto_source_pairs = make_binned_two_point_filtered(all_bins, auto_source_selector)

print(f"Auto-correlation source pairs: {len(auto_source_pairs)}")
Auto-correlation source pairs: 5

Display auto-correlation source pairs:

Code
import pandas as pd
from IPython.display import Markdown

auto_source_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in auto_source_pairs
]

df = pd.DataFrame(auto_source_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_source0 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source1 lsst_y1_source1 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source2 lsst_y1_source2 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source3 lsst_y1_source3 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source4 lsst_y1_source4 Galaxies.SHEAR_E Galaxies.SHEAR_E

OR Operator (|)

Keep pairs that satisfy either condition:

# Either auto-correlations OR source-lens cross-correlations
mixed_selector = AutoNameBinPairSelector() | SourceLensBinPairSelector()
mixed_pairs = make_binned_two_point_filtered(all_bins, mixed_selector)

print(f"Auto OR source-lens pairs: {len(mixed_pairs)}")
Auto OR source-lens pairs: 35

Display mixed pairs:

Code
import pandas as pd
from IPython.display import Markdown

mixed_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in mixed_pairs[::5]  # Show every 5th pair
]

df = pd.DataFrame(mixed_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_source0 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source0 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source1 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source2 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_lens0 lsst_y1_lens0 Galaxies.COUNTS Galaxies.COUNTS

NOT Operator (~)

Keep pairs that do not satisfy the condition:

# Everything except auto-correlations (i.e., cross-correlations)
not_auto_selector = ~AutoNameBinPairSelector()
not_auto_pairs = make_binned_two_point_filtered(all_bins, not_auto_selector)

print(f"Non-auto (cross) pairs: {len(not_auto_pairs)}")
Non-auto (cross) pairs: 45

Display non-auto pairs:

Code
import pandas as pd
from IPython.display import Markdown

not_auto_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in not_auto_pairs[::5]  # Show every 5th pair
]

df = pd.DataFrame(not_auto_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_source1 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source1 lsst_y1_source3 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source0 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source1 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source2 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_lens0 lsst_y1_lens1 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens1 lsst_y1_lens3 Galaxies.COUNTS Galaxies.COUNTS

Complex Combinations

Combine multiple operators:

# (Auto-correlations AND source measurements) OR (lens auto-correlations)
complex_selector = (
    (AutoNameBinPairSelector() & SourceBinPairSelector()) |
    (AutoNameBinPairSelector() & LensBinPairSelector())
)
complex_pairs = make_binned_two_point_filtered(all_bins, complex_selector)

print(f"Complex selection: {len(complex_pairs)}")
Complex selection: 10

Display complex pairs:

Code
import pandas as pd
from IPython.display import Markdown

complex_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in complex_pairs[::2]  # Show every 2nd pair
]
df = pd.DataFrame(complex_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_source0 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source2 lsst_y1_source2 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source4 lsst_y1_source4 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_lens1 lsst_y1_lens1 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens3 lsst_y1_lens3 Galaxies.COUNTS Galaxies.COUNTS

Composite Selectors

Composite selectors are specialized selectors that combine multiple simpler selectors according to specific logic. Unlike using logical operators (&, |, ~) directly, composite selectors implement domain-specific selection patterns as reusable classes.

Understanding Composite Selectors

A CompositeSelector is a base class for selectors that internally manage a list of other selectors. When you use composite selectors, you benefit from:

  • Encapsulation: Complex logic is packaged into a single, named selector
  • Reusability: Common patterns (like “auto-correlations”, “3×2pt”) can be used consistently
  • Clarity: Code intent is clearer with descriptive selector names

Example: The ThreeTwoBinPairSelector

The ThreeTwoBinPairSelector is a composite selector designed for “3×2pt” analyses, which combine three types of two-point correlations:

  1. Cosmic shear (source × source)
  2. Galaxy-galaxy lensing (source × lens)
  3. Galaxy clustering (lens × lens)

This is a standard observable combination in weak lensing surveys like DES, HSC, and LSST.

from firecrown.metadata_types import ThreeTwoBinPairSelector

# Create a 3×2pt selector
three_two_selector = ThreeTwoBinPairSelector(
    source_dist=1, lens_dist=1, source_lens_dist=5
)

# Apply to our bins
three_two_pairs = make_binned_two_point_filtered(all_bins, three_two_selector)

print(f"3×2pt pairs: {len(three_two_pairs)}")
3×2pt pairs: 28

The 3×2pt selector includes:

  • All cosmic shear auto-correlations and cross-correlations (source × source)
  • All galaxy-galaxy lensing (source × lens, in both orders)
  • All galaxy clustering auto-correlations and cross-correlations (lens × lens)

Display 3×2pt pairs:

Code
import pandas as pd
from IPython.display import Markdown

three_two_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in three_two_pairs
]

df = pd.DataFrame(three_two_pairs_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
lsst_y1_source0 lsst_y1_source0 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source0 lsst_y1_source1 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source1 lsst_y1_source1 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source1 lsst_y1_source2 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source2 lsst_y1_source2 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source2 lsst_y1_source3 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source3 lsst_y1_source3 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source3 lsst_y1_source4 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source4 lsst_y1_source4 Galaxies.SHEAR_E Galaxies.SHEAR_E
lsst_y1_source1 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source2 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source2 lsst_y1_lens1 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens1 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source3 lsst_y1_lens2 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens0 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens1 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens2 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_source4 lsst_y1_lens3 Galaxies.SHEAR_E Galaxies.COUNTS
lsst_y1_lens0 lsst_y1_lens0 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens0 lsst_y1_lens1 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens1 lsst_y1_lens1 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens1 lsst_y1_lens2 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens2 lsst_y1_lens2 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens2 lsst_y1_lens3 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens3 lsst_y1_lens3 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens3 lsst_y1_lens4 Galaxies.COUNTS Galaxies.COUNTS
lsst_y1_lens4 lsst_y1_lens4 Galaxies.COUNTS Galaxies.COUNTS

Creating Custom Composite Selectors

You can create your own composite selectors for domain-specific patterns. Here’s how to implement one:

from typing import Any
from firecrown.metadata_types import CompositeSelector, BinPairSelector
from firecrown.metadata_types import register_bin_pair_selector
from pydantic import Field


@register_bin_pair_selector
class CustomTwoTwoBinPairSelector(CompositeSelector):
    """Select 2×2pt: cosmic shear + galaxy clustering (no galaxy-galaxy lensing)."""

    kind: str = "custom_2x2pt"

    def model_post_init(self, _: Any, /) -> None:
        self._impl = SourceBinPairSelector() | LensBinPairSelector()


# Use the custom selector
two_two_selector = CustomTwoTwoBinPairSelector()
two_two_pairs = make_binned_two_point_filtered(all_bins, two_two_selector)

print(f"Custom 2×2pt pairs (no galaxy-galaxy lensing): {len(two_two_pairs)}")
Custom 2×2pt pairs (no galaxy-galaxy lensing): 30

When to Use Composite Selectors

Use composite selectors when:

  • You have a well-defined, reusable selection pattern
  • The pattern combines multiple criteria in a specific way
  • You want to give the pattern a meaningful name (like “3×2pt”)
  • You need to serialize and share the pattern across analyses

Use direct logical operators (&, |, ~) when:

  • You need a one-off combination
  • The logic is simple and self-explanatory
  • You’re experimenting with different criteria

Combining Composite Selectors with Logical Operators

Composite selectors can be combined with other selectors using logical operators:

# 3×2pt but remove (lsst_y1_source4, lsst_y1_lens3) pair
three_two_rm = ThreeTwoBinPairSelector() & ~NamedBinPairSelector(
    names=[("lsst_y1_source4", "lsst_y1_lens3")]
)

# Use the combined selector
three_two_rm_pairs = make_binned_two_point_filtered(all_bins, three_two_rm)

print(
    f"3×2pt pairs with (lsst_y1_source4, lsst_y1_lens3) removed: {len(three_two_rm_pairs)}"
)
3×2pt pairs with (lsst_y1_source4, lsst_y1_lens3) removed: 39

This flexibility allows you to start with standard patterns (like 3×2pt) and refine them as needed for your specific analysis.

Advanced Physical Constraints

Beyond the pre-built selectors, you can construct sophisticated physical selections by combining multiple building blocks.

Combining Multiple Requirements

Real analyses often need to combine several constraints:

from firecrown.metadata_types import (
    SourceLensBinPairSelector,
    CrossNameDiffBinPairSelector,
    NamedBinPairSelector,
)

# Galaxy-galaxy lensing with multiple constraints
# (assumes bins follow naming convention where index correlates with redshift)
multi_constrained_gl = (
    SourceLensBinPairSelector()  # Select (Source, Lens) measurement pairs
    & CrossNameDiffBinPairSelector(neighbors_diff=[0, 1, 2])  # Index-based constraint
    & ~NamedBinPairSelector(  # Remove specific problematic pairs
        names=[
            ("lsst_y1_source4", "lsst_y1_lens3"),  # Example: noisy pair
        ]
    )
)

multi_constrained_gl_pairs = make_binned_two_point_filtered(all_bins, multi_constrained_gl)
print(f"Multi-constrained galaxy-galaxy lensing: {len(multi_constrained_gl_pairs)} pairs")
Multi-constrained galaxy-galaxy lensing: 11 pairs

This combines:

  1. Measurement type filtering: Only source-lens pairs
  2. Index-based constraints: Assumes naming convention where indices differ by 0, 1, or 2
  3. Manual exclusions: Remove specific pairs known to be problematic

TypeSourceBinPairSelector: Filtering by Data Provenance

The TypeSourceBinPairSelector filters based on how the data was obtained (e.g., spectroscopic vs photometric redshifts):

from firecrown.metadata_types import TypeSource, TypeSourceBinPairSelector

# Only correlate bins with matching data source
type_source_selector = TypeSourceBinPairSelector(
    type_source=TypeSource.DEFAULT
)

# Note: In our generated bins, all have the same type_source
# so this doesn't filter anything here. In real surveys, you might have:
# - TypeSource.PHOTOMETRIC for photo-z bins
# - TypeSource.SPECTROSCOPIC for spec-z bins
# - Custom TypeSource values for different surveys

type_filtered_pairs = make_binned_two_point_filtered(all_bins, type_source_selector)
print(f"Type-source filtered pairs: {len(type_filtered_pairs)}")
Type-source filtered pairs: 55

Use cases:

  • Separate analyses for spectroscopic vs photometric samples
  • Avoid mixing bins from different surveys with incompatible systematics
  • Ensure consistent redshift calibration within correlations

Building-Block Philosophy in Practice

The power of Firecrown’s selector system comes from composition. Here’s a typical workflow:

Step 1: Start with Measurement Types

# Start with the fundamental building block
step1 = SourceLensBinPairSelector()  # Selects (Source, Lens) measurement pairs
step1_pairs = make_binned_two_point_filtered(all_bins, step1)
print(f"Step 1 (measurement types only): {len(step1_pairs)} pairs")
Step 1 (measurement types only): 25 pairs

Step 2: Add Constraints Based on Your Data

# IF your bins follow a naming convention, add index-based filtering
step2 = (
    SourceLensBinPairSelector()
    & CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
)
step2_pairs = make_binned_two_point_filtered(all_bins, step2)
print(f"Step 2 (+ index constraint): {len(step2_pairs)} pairs")
print(f"  Filtered out {len(step1_pairs) - len(step2_pairs)} pairs")
Step 2 (+ index constraint): 9 pairs
  Filtered out 16 pairs

Step 3: Refine Based on Data Quality

# Remove specific pairs if needed
step3 = (
    SourceLensBinPairSelector()
    & CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
    & ~NamedBinPairSelector(
        names=[
            ("lsst_y1_source4", "lsst_y1_lens4"),  # Example problematic pair
        ]
    )
)
step3_pairs = make_binned_two_point_filtered(all_bins, step3)
print(f"Step 3 (+ manual exclusions): {len(step3_pairs)} pairs")
print(f"  Filtered out {len(step2_pairs) - len(step3_pairs)} additional pairs")
Step 3 (+ manual exclusions): 8 pairs
  Filtered out 1 additional pairs
When to Create a Custom CompositeSelector

If you find yourself using the same complex selector combination repeatedly, consider creating a custom CompositeSelector class:

from firecrown.metadata_types import CompositeSelector, register_bin_pair_selector
from typing import Any

@register_bin_pair_selector
class MyCustomSelector(CompositeSelector):
    """Custom selector for my specific analysis."""
    
    kind: str = "my-custom-selector"
    max_separation: int = 2
    
    def model_post_init(self, _: Any, /) -> None:
        self._impl = (
            SourceLensBinPairSelector()
            & CrossNameDiffBinPairSelector(
                neighbors_diff=list(range(0, self.max_separation + 1))  # [0, max_separation]
            )
        )

This makes your code more maintainable and allows you to serialize/deserialize the selector configuration.

Practical Example: Optimizing Signal-to-Noise

Combine physical constraints with signal-to-noise optimization:

# High-S/N selection: nearby bins only, exclude far-separated pairs
high_sn_3x2pt = (
    ThreeTwoBinPairSelector(
        source_dist=2,   # Very conservative
        lens_dist=2,
        source_lens_dist=1
    )
    # Could add more constraints here
)

high_sn_pairs = make_binned_two_point_filtered(all_bins, high_sn_3x2pt)
print(f"\nHigh-S/N 3×2pt selection: {len(high_sn_pairs)} pairs")
print("Trade-off: Fewer pairs but stronger signals per pair")

High-S/N 3×2pt selection: 28 pairs
Trade-off: Fewer pairs but stronger signals per pair

Physical 3×2pt Selection with Redshift Constraints

The ThreeTwoBinPairSelector implements the standard “3×2pt” cosmological analysis, which combines:

  1. Cosmic shear (source × source): Weak lensing auto-correlations
  2. Galaxy clustering (lens × lens): Galaxy position auto-correlations
  3. Galaxy-galaxy lensing (source × lens): Cross-correlations

Crucially, it uses NameDiffBinPairSelector internally to enforce physical redshift constraints.

Understanding the Distance Parameters

The selector has three parameters controlling which bin pairs are included:

from firecrown.metadata_types import ThreeTwoBinPairSelector

# Default: fairly permissive
default_3x2pt = ThreeTwoBinPairSelector(
    source_dist=5,       # Cosmic shear: source_i - source_j ∈ [-5, 5]
    lens_dist=5,         # Clustering: lens_i - lens_j ∈ [-5, 5]
    source_lens_dist=5   # Galaxy-galaxy lensing: source_i - lens_j ∈ {1,2,3,4,5}
)

default_pairs = make_binned_two_point_filtered(all_bins, default_3x2pt)
print(f"Default 3×2pt: {len(default_pairs)} pairs")
Default 3×2pt: 40 pairs

What each parameter does:

  • source_dist: For cosmic shear, controls which source bin index pairs are included
    • Difference: source_i - source_j ∈ [-source_dist, source_dist]
    • Example: source_dist=5 allows differences in {-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5}
    • Larger values → more cross-redshift correlations → more data, but weaker signals
  • lens_dist: For galaxy clustering, same logic for lens bin pairs
    • Difference: lens_i - lens_j ∈ [-lens_dist, lens_dist]
  • source_lens_dist: For galaxy-galaxy lensing, enforces source_index - lens_index ∈ {1, 2, ..., source_lens_dist}
    • Does NOT include 0: Excludes same-index pairs to avoid mixing source/lens from same redshift bin
    • Only positive differences: Ensures lens is at lower redshift than source
    • Physically motivated: Lens must be in front of source
Physical Motivation for Distance Parameters

Why limit redshift separation?

  • Signal-to-noise decreases for widely separated bins
  • Computational cost increases with more pairs
  • Very distant bins may have negligible correlation

Why exclude same-index source-lens pairs?

  • Standard practice: source and lens samples are typically drawn from the same parent population, just split by different selection criteria
  • Correlating them at the same redshift would mix the selection effects in complicated ways
  • Different analyses may handle this differently, but ThreeTwoBinPairSelector follows the conservative standard

Tuning for Your Analysis

# Conservative: Only nearest neighbors (highest S/N)
conservative_3x2pt = ThreeTwoBinPairSelector(
    source_dist=2,       # Only adjacent source bins
    lens_dist=2,         # Only adjacent lens bins
    source_lens_dist=2   # Lens within 1 bin of source
)
conservative_pairs = make_binned_two_point_filtered(all_bins, conservative_3x2pt)

# Aggressive: More cross-correlations (more data, weaker signals)
aggressive_3x2pt = ThreeTwoBinPairSelector(
    source_dist=10,
    lens_dist=10,
    source_lens_dist=8
)
aggressive_pairs = make_binned_two_point_filtered(all_bins, aggressive_3x2pt)

print(f"Conservative 3×2pt: {len(conservative_pairs)} pairs")
print(f"Default 3×2pt: {len(default_pairs)} pairs")
print(f"Aggressive 3×2pt: {len(aggressive_pairs)} pairs")
Conservative 3×2pt: 31 pairs
Default 3×2pt: 40 pairs
Aggressive 3×2pt: 40 pairs

Verifying Physical Consistency

Let’s verify that galaxy-galaxy lensing pairs have lens at lower indices:

Code
import pandas as pd
from IPython.display import Markdown
import re

# Extract galaxy-galaxy lensing pairs
gl_pairs = [
    p for p in default_pairs
    if "source" in p.x.bin_name and "lens" in p.y.bin_name
]

# Parse indices
def get_index(bin_name):
    match = re.search(r'(\d+)$', bin_name)
    return int(match.group(1)) if match else None

gl_table = [
    {
        "source-bin": p.x.bin_name,
        "lens-bin": p.y.bin_name,
        "source-idx": get_index(p.x.bin_name),
        "lens-idx": get_index(p.y.bin_name),
        "diff": get_index(p.x.bin_name) - get_index(p.y.bin_name),
    }
    for p in gl_pairs[:10]  # Show first 10
]

df = pd.DataFrame(gl_table)
print("\nGalaxy-galaxy lensing pairs (source × lens):")
print("Note: diff = source_idx - lens_idx should be positive (lens at lower z)\n")
display(Markdown(df.to_markdown(index=False)))

Galaxy-galaxy lensing pairs (source × lens):
Note: diff = source_idx - lens_idx should be positive (lens at lower z)
source-bin lens-bin source-idx lens-idx diff
lsst_y1_source1 lsst_y1_lens0 1 0 1
lsst_y1_source2 lsst_y1_lens0 2 0 2
lsst_y1_source2 lsst_y1_lens1 2 1 1
lsst_y1_source3 lsst_y1_lens0 3 0 3
lsst_y1_source3 lsst_y1_lens1 3 1 2
lsst_y1_source3 lsst_y1_lens2 3 2 1
lsst_y1_source4 lsst_y1_lens0 4 0 4
lsst_y1_source4 lsst_y1_lens1 4 1 3
lsst_y1_source4 lsst_y1_lens2 4 2 2
lsst_y1_source4 lsst_y1_lens3 4 3 1

When generating metadata from scratch, apply selectors to control which pairs are created:

import numpy as np
from firecrown.metadata_types import TwoPointHarmonic, AutoBinPairSelector

# Create selector for auto-correlations (name AND measurement)
auto_both_selector = AutoBinPairSelector()

# Filter pairs during generation
auto_both_pairs = make_binned_two_point_filtered(all_bins, auto_both_selector)

# Create harmonic-space metadata
ells = np.unique(np.geomspace(2, 2000, 128).astype(int))
auto_harmonic_metadata = [TwoPointHarmonic(XY=xy, ells=ells) for xy in auto_both_pairs]

print(f"Auto-correlation harmonic metadata: {len(auto_harmonic_metadata)}")
Auto-correlation harmonic metadata: 10

Using Selectors with SACC Extraction

When extracting from SACC files, apply selectors to filter which correlations are loaded:

from firecrown.likelihood.factories import load_sacc_data
from firecrown.metadata_functions import extract_all_real_metadata

# Load SACC file
sacc_data = load_sacc_data("../tests/sacc_data.hdf5")

# Extract only source auto-correlations
source_auto_selector = SourceBinPairSelector() & AutoNameBinPairSelector()
source_auto_metadata = extract_all_real_metadata(
    sacc_data, bin_pair_selector=source_auto_selector
)

print(f"Source auto-correlations from SACC: {len(source_auto_metadata)}")
Source auto-correlations from SACC: 8

Display the source auto-correlations:

Code
import pandas as pd
from IPython.display import Markdown

source_auto_table = [
    {
        "bin-x": real.XY.x.bin_name,
        "bin-y": real.XY.y.bin_name,
        "measurement-x": str(real.XY.x_measurement),
        "measurement-y": str(real.XY.y_measurement),
    }
    for real in source_auto_metadata
]

df = pd.DataFrame(source_auto_table)
Markdown(df.to_markdown(index=False))
bin-x bin-y measurement-x measurement-y
src0 src0 Galaxies.PART_OF_XI_MINUS Galaxies.PART_OF_XI_MINUS
src1 src1 Galaxies.PART_OF_XI_MINUS Galaxies.PART_OF_XI_MINUS
src2 src2 Galaxies.PART_OF_XI_MINUS Galaxies.PART_OF_XI_MINUS
src3 src3 Galaxies.PART_OF_XI_MINUS Galaxies.PART_OF_XI_MINUS
src0 src0 Galaxies.PART_OF_XI_PLUS Galaxies.PART_OF_XI_PLUS
src1 src1 Galaxies.PART_OF_XI_PLUS Galaxies.PART_OF_XI_PLUS
src2 src2 Galaxies.PART_OF_XI_PLUS Galaxies.PART_OF_XI_PLUS
src3 src3 Galaxies.PART_OF_XI_PLUS Galaxies.PART_OF_XI_PLUS

Compare with extracting all data:

# Extract everything (no selector)
all_metadata = extract_all_real_metadata(sacc_data)

print(f"All correlations in SACC: {len(all_metadata)}")
print(f"Filtered to source auto: {len(source_auto_metadata)}")
All correlations in SACC: 45
Filtered to source auto: 8

Serialization

Selectors can be serialized to YAML for reuse:

from firecrown.utils import base_model_to_yaml, base_model_from_yaml

# Create a complex selector
selector = (AutoNameBinPairSelector() & SourceBinPairSelector()) | LensBinPairSelector()

# Serialize to YAML
selector_yaml = base_model_to_yaml(selector)
print("Serialized selector:")
print(selector_yaml)
Serialized selector:
kind: or
pair_selectors:
- kind: and
  pair_selectors:
  - {kind: auto-name}
  - {kind: source}
- {kind: lens}

Load from YAML:

from firecrown.metadata_types import BinPairSelector

# Deserialize
loaded_selector = base_model_from_yaml(BinPairSelector, selector_yaml)

# Use the loaded selector
loaded_pairs = make_binned_two_point_filtered(all_bins, loaded_selector)
print(f"Pairs from loaded selector: {len(loaded_pairs)}")
Pairs from loaded selector: 20

Summary

Bin pair selectors provide a declarative way to control which bin pairs are included in your analysis:

  • Atomic selectors: Auto, cross, source, lens, named pairs, index-based constraints
  • Physical relevance: Index-based selectors enforce redshift ordering conventions
  • Building blocks: Combine atomic selectors for domain-specific physics
  • Logical operators: AND (&), OR (|), NOT (~)
  • Integration: Works with both generators and SACC extraction
  • Serialization: Save and load selector configurations

Key Selectors for Physical Constraints

All (Source, Lens) pairs

Galaxy-galaxy lensing (with naming convention)

Nearby bins (with naming convention)

Same data source

Standard 3×2pt (with naming convention)

Custom high-S/N

Key Functions

Function Purpose Selector Parameter
make_binned_two_point_filtered Filter generated bins Required
filter_two_point_combinations Filter existing combinations Required
extract_all_photoz_bin_combinations Filter SACC bin combinations Optional
extract_all_harmonic_metadata Filter SACC harmonic data Optional
extract_all_real_metadata Filter SACC real-space data Optional

Critical Takeaways

Key Principles
  1. Start with measurement types: SourceLensBinPairSelector selects (Source, Lens) pairs
  2. Add constraints as appropriate: Use naming conventions, manual lists, or (in the future) dndz inspection
  3. Building blocks enable flexibility: Compose atomic selectors for your specific requirements
  4. Different approaches are valid: Index-based filtering is convenient but not the only way
Best Practices
  1. Start simple, add complexity: Begin with measurement-type selectors, then add constraints
  2. Understand your data: Know whether your bins follow naming conventions or need other filtering
  3. Verify your selection: Check that the resulting pairs make sense for your analysis
  4. Document your choices: Be explicit about assumptions (e.g., “assumes index ∝ redshift”)
  5. Create custom composites: If you reuse a pattern, make it a CompositeSelector

Next Steps

Now that you understand bin pair selectors: