Bin Pair Selectors: Filtering Two-Point Correlations

Version ?env:FIRECROWN_VERSION

Authors

Marc Paterno

Sandro Vitenti

Purpose of this Document

This tutorial explains how to use BinPairSelector objects to control which pairs of tomographic bins are included in two-point analyses. Selectors provide a powerful way to filter correlations based on bin names, measurement types, or custom criteria.

Why Use Bin Pair Selectors?

When working with two-point statistics, you often need to include only specific subsets of all possible bin pair combinations:

Auto-correlations only: Same bin correlated with itself
Cross-correlations only: Different bins correlated together
Source measurements only: Weak lensing (shear) correlations
Lens measurements only: Galaxy counts correlations
Specific bin pairs: Explicitly named combinations
Physical consistency: Avoid unphysical correlations (e.g., lens behind source)
Redshift constraints: Limit correlations based on tomographic bin separation
Custom criteria: Combine multiple conditions with logical operators

Bin pair selectors let you express these requirements declaratively, making your code clearer and more maintainable.

Basic Concepts

A BinPairSelector is applied when creating TwoPointXY combinations from InferredGalaxyZDist bins. It determines whether to keep or discard each potential pair.

The `BinPairSelector.keep` Method

Every selector implements a BinPairSelector.keep method that takes:

zdist: A pair of InferredGalaxyZDist objects (bin1, bin2)
m: A pair of Measurement types (measurement1, measurement2)

The method returns True to keep the pair, False to discard it.

Physical Relevance in Bin Selection

When selecting bin pairs for two-point analyses, you may need to ensure that the selected pairs are physically meaningful for your analysis. Not all mathematically possible bin combinations may correspond to useful correlations.

Galaxy-Galaxy Lensing: A Key Example

Consider galaxy-galaxy lensing, which measures the correlation between:

Source galaxies (background): Whose shapes are distorted by lensing
Lens galaxies (foreground): Whose gravitational field causes the lensing

For lensing to occur, the lens must be in front of the source (at lower or comparable redshift). How you ensure this depends on your data and naming conventions.

The Building Block Approach

Firecrown’s SourceLensBinPairSelector is a fundamental building block that selects all (Source, Lens) measurement pairs. By itself, it doesn’t enforce any redshift ordering, it simply selects based on measurement types.

If your bins follow a naming convention where indices correspond to increasing redshift (e.g., bin0 < bin1 < bin2 in redshift), you can combine SourceLensBinPairSelector with NameDiffBinPairSelector to select only pairs where the lens could plausibly be in front of the source.

Other approaches are possible: In the future, Firecrown may include selectors that directly inspect the dndz distributions of bins to determine redshift ordering, providing an alternative to naming-convention-based filtering.

Naming Conventions for Redshift Ordering

Many surveys use a convention where tomographic bins are numbered in order of increasing redshift:

lsst_y1_source0 contains sources at the lowest redshift
lsst_y1_source4 contains sources at the highest redshift
Similarly: lsst_y1_lens0 < lsst_y1_lens1 < … in redshift

If your bins follow this convention, NameDiffBinPairSelector provides a convenient way to filter by redshift proximity. If not, you may need different selectors or manual filtering.

Example: Combining Selectors for Redshift Constraints

Here’s how you can add redshift constraints if your bins follow the naming convention:

from firecrown.metadata_types import (
    SourceLensBinPairSelector,
    NameDiffBinPairSelector,
)
from firecrown.generators import (
    LSST_Y1_LENS_HARMONIC_BIN_COLLECTION,
    LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION,
)
from firecrown.metadata_functions import (
    make_binned_two_point_filtered,
    filter_two_point_combinations,
    make_all_photoz_bin_combinations,
)

# Generate bins (these follow the naming convention: higher index = higher redshift)
count_bins = LSST_Y1_LENS_HARMONIC_BIN_COLLECTION.generate()
shear_bins = LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION.generate()
all_bins = count_bins + shear_bins

# Basic approach: ALL source-lens pairs (measurement types only)
basic_selector = SourceLensBinPairSelector()
basic_pairs = make_binned_two_point_filtered(all_bins, basic_selector)

# Redshift-constrained approach: Add index-based filtering
# This works because LSST bins follow the naming convention (index ∝ redshift)
# neighbors_diff=[0, 1] means: source_index - lens_index ∈ {0, 1}
# i.e., lens at same or slightly lower redshift
redshift_constrained_selector = SourceLensBinPairSelector() & NameDiffBinPairSelector(
    same_name_prefix=False,  # Different prefixes (source vs lens)
    neighbors_diff=[0, 1],   # lens_index ∈ {source_index, source_index - 1}
)
redshift_constrained_pairs = make_binned_two_point_filtered(all_bins, redshift_constrained_selector)

print(f"Basic selection (measurement types): {len(basic_pairs)} pairs")
print(f"Redshift-constrained selection: {len(redshift_constrained_pairs)} pairs")
print(f"Filtered out {len(basic_pairs) - len(redshift_constrained_pairs)} pairs based on index")

Basic selection (measurement types): 25 pairs
Redshift-constrained selection: 9 pairs
Filtered out 16 pairs based on index

Here are the pairs kept by the redshift-constrained selector:

Code

import pandas as pd
from IPython.display import Markdown

# Show sample of redshift-constrained pairs
pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in redshift_constrained_pairs
]

df = pd.DataFrame(pairs_table)
display(Markdown(df.to_markdown(index=False)))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source1	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source1	lsst_y1_lens1	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source2	lsst_y1_lens1	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source2	lsst_y1_lens2	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens2	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens3	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens3	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens4	Galaxies.SHEAR_E	Galaxies.COUNTS

The selector configuration:

print(f"Selector: {redshift_constrained_selector}")
print(f"\nComponents:")
print(f"  - SourceLensBinPairSelector: selects (Source, Lens) measurement pairs")
print(f"  - NameDiffBinPairSelector:")
print(f"      same_name_prefix=False  (different bin types allowed)")
print(f"      neighbors_diff=[0, 1]   (source_index - lens_index ∈ {{0, 1}})")

Selector: kind='and' pair_selectors=[SourceLensBinPairSelector(kind='source-lens'), NameDiffBinPairSelector(kind='name-diff', same_name_prefix=False, neighbors_diff=[0, 1])]

Components:
  - SourceLensBinPairSelector: selects (Source, Lens) measurement pairs
  - NameDiffBinPairSelector:
      same_name_prefix=False  (different bin types allowed)
      neighbors_diff=[0, 1]   (source_index - lens_index ∈ {0, 1})

Design Philosophy: Flexible Building Blocks

Firecrown provides atomic selector building blocks rather than prescribing one “correct” way to select bin pairs. This flexibility is essential because:

Different analyses have different requirements for which pairs to include
Physical constraints can be encoded in different ways (naming conventions, dndz inspection, manual lists, etc.)
Redshift overlap between bins varies by survey
Some analyses may intentionally include broader selections

SourceLensBinPairSelector is the fundamental building block, it selects (Source, Lens) measurement pairs. You can then combine it with other selectors to add constraints appropriate to your data:

If using naming conventions: Combine with NameDiffBinPairSelector
If listing pairs explicitly: Combine with NamedBinPairSelector
Future possibilities: Selectors that inspect dndz distributions directly

When a selector pattern proves broadly useful (like 3×2pt with the naming convention), it’s added as a pre-built CompositeSelector.

Common Selector Types

Below are some common selectors you might use in your code. Note that we show only a fraction of their output for brevity.

Auto-Correlation Selectors

Select only pairs where both bins are the same:

from firecrown.metadata_types import AutoNameBinPairSelector
from firecrown.generators import (
    LSST_Y1_LENS_HARMONIC_BIN_COLLECTION,
    LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION,
)
from firecrown.metadata_functions import make_binned_two_point_filtered

# Generate LSST Y1 bins
count_bins = LSST_Y1_LENS_HARMONIC_BIN_COLLECTION.generate()
shear_bins = LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION.generate()
all_bins = count_bins + shear_bins

# Select only auto-correlations (same bin name)
auto_name_selector = AutoNameBinPairSelector()
auto_name_pairs = make_binned_two_point_filtered(all_bins, auto_name_selector)

print(f"Total bins: {len(all_bins)}")
print(f"Auto-correlation pairs (by name): {len(auto_name_pairs)}")

Total bins: 10
Auto-correlation pairs (by name): 10

If you also want matching measurement types only, use AutoMeasurementBinPairSelector. Note that this selector requires only the measurement types to match, not the bin names:

from firecrown.metadata_types import AutoMeasurementBinPairSelector

# Select pairs with same measurement type
auto_measurement_selector = AutoMeasurementBinPairSelector()
auto_measurement_pairs = make_binned_two_point_filtered(
    all_bins, auto_measurement_selector
)

print(f"Auto-correlation pairs (by measurement): {len(auto_measurement_pairs)}")

Auto-correlation pairs (by measurement): 30

Display the auto-correlation pairs:

Code

import pandas as pd
from IPython.display import Markdown

auto_measurement_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in auto_measurement_pairs[::3]  # Show every 3rd pair
]

df = pd.DataFrame(auto_measurement_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_source0	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source0	lsst_y1_source3	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source1	lsst_y1_source2	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source2	lsst_y1_source2	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source3	lsst_y1_source3	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_lens0	lsst_y1_lens0	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens0	lsst_y1_lens3	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens1	lsst_y1_lens2	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens2	lsst_y1_lens2	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens3	lsst_y1_lens3	Galaxies.COUNTS	Galaxies.COUNTS

Cross-Correlation Selectors

Select only pairs where bins are different:

from firecrown.metadata_types import CrossNameBinPairSelector

# Select only cross-correlations (different bin names)
cross_selector = CrossNameBinPairSelector()
cross_pairs = make_binned_two_point_filtered(all_bins, cross_selector)

print(f"Cross-correlation pairs: {len(cross_pairs)}")

Cross-correlation pairs: 45

Display the cross-correlation pairs:

Code

import pandas as pd
from IPython.display import Markdown

cross_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in cross_pairs[::5]  # Show every 5th pair
]

df = pd.DataFrame(cross_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_source1	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source1	lsst_y1_source3	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source0	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source1	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source2	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_lens0	lsst_y1_lens1	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens1	lsst_y1_lens3	Galaxies.COUNTS	Galaxies.COUNTS

Measurement Type Selectors

Select based on the type of measurement:

from firecrown.metadata_types import SourceBinPairSelector, LensBinPairSelector

# Source measurements (weak lensing shear)
source_selector = SourceBinPairSelector()
source_pairs = make_binned_two_point_filtered(all_bins, source_selector)

# Lens measurements (galaxy counts)
lens_selector = LensBinPairSelector()
lens_pairs = make_binned_two_point_filtered(all_bins, lens_selector)

print(f"Source (shear) pairs: {len(source_pairs)}")
print(f"Lens (counts) pairs: {len(lens_pairs)}")

Source (shear) pairs: 15
Lens (counts) pairs: 15

Source-lens cross-correlations:

from firecrown.metadata_types import SourceLensBinPairSelector

# Source × Lens cross-correlations
source_lens_selector = SourceLensBinPairSelector()
source_lens_pairs = make_binned_two_point_filtered(all_bins, source_lens_selector)

print(f"Source × Lens pairs: {len(source_lens_pairs)}")

Source × Lens pairs: 25

Display source-lens pairs:

Code

import pandas as pd
from IPython.display import Markdown

source_lens_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in source_lens_pairs[::5]  # Show every 5th pair
]

df = pd.DataFrame(source_lens_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source1	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source2	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS

Named Pair Selectors

Select specific bin name combinations explicitly:

from firecrown.metadata_types import NamedBinPairSelector

# Select specific pairs by name
named_selector = NamedBinPairSelector(
    names=[
        ("lsst_y1_lens0", "lsst_y1_lens1"),
        ("lsst_y1_source0", "lsst_y1_source0"),
        ("lsst_y1_source0", "lsst_y1_lens0"),
    ]
)
named_pairs = make_binned_two_point_filtered(all_bins, named_selector)

print(f"Named pairs: {len(named_pairs)}")

Named pairs: 3

Note: Matching is order-dependent. ("bin0", "bin1") is different from ("bin1", "bin0"). Include both if you want symmetric matching.

Display named pairs:

Code

import pandas as pd
from IPython.display import Markdown

named_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in named_pairs
]

df = pd.DataFrame(named_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_source0	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source0	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_lens0	lsst_y1_lens1	Galaxies.COUNTS	Galaxies.COUNTS

Index-Based Selectors

Index-based selectors filter pairs based on the numeric indices in bin names and their text prefixes. These are relevant for implementing physical constraints based on the redshift ordering convention.

Understanding Bin Name Structure

Bin names follow the pattern <prefix><number>:

lsst_y1_source0 → prefix: lsst_y1_source, index: 0
lsst_y1_lens3 → prefix: lsst_y1_lens, index: 3

NameDiffBinPairSelector: Index-Based Filtering

This selector is useful when your bins follow a naming convention where the numeric index correlates with some property (typically redshift). It filters based on:

Whether bin prefixes match (same_name_prefix)
The difference between bin indices (neighbors_diff)

The index difference is computed as: left_index - right_index

from firecrown.metadata_types import NameDiffBinPairSelector

# Example 1: Same prefix, adjacent or identical bins
# Keeps bins differing by 0, ±1 in index
same_prefix_selector = NameDiffBinPairSelector(
    same_name_prefix=True,
    neighbors_diff=[0, 1, -1]
)

# Example 2: Different prefixes, lens at lower/equal redshift
# For galaxy-galaxy lensing: source_index - lens_index ∈ {0, 1}
diff_prefix_selector = NameDiffBinPairSelector(
    same_name_prefix=False,
    neighbors_diff=[0, 1]
)

# Apply to our bins
same_prefix_pairs = make_binned_two_point_filtered(all_bins, same_prefix_selector)
diff_prefix_pairs = make_binned_two_point_filtered(all_bins, diff_prefix_selector)

print(f"Same prefix, nearby indices: {len(same_prefix_pairs)} pairs")
print(f"Different prefix, controlled indices: {len(diff_prefix_pairs)} pairs")

Same prefix, nearby indices: 18 pairs
Different prefix, controlled indices: 9 pairs

Key behaviors:

Configuration	Keeps	Rejects	Use Case
`same_name_prefix=True` `neighbors_diff=1`	(bin0, bin1) (bin1, bin0)	(bin0, bin2) (bin2, bin2) (src0, bin0)	Adjacent bins, same type
`same_name_prefix=False` `neighbors_diff=[0, 1]`	(src0, lens0) (src1, lens0) (src1, lens1)	(src0, lens2) (lens1, src0)	Cross-type, redshift constraint
`same_name_prefix=True` `neighbors_diff=0`	(bin0, bin0) (bin1, bin1)	(bin0, bin1)	Exact auto-correlations

AutoNameDiffBinPairSelector

Convenience selector for same_name_prefix=True:

from firecrown.metadata_types import AutoNameDiffBinPairSelector

# Equivalent to NameDiffBinPairSelector(same_name_prefix=True, neighbors_diff=[0, 1, -1])
auto_neighbor_selector = AutoNameDiffBinPairSelector(neighbors_diff=[0, 1, -1])
auto_neighbor_pairs = make_binned_two_point_filtered(all_bins, auto_neighbor_selector)

print(f"Auto with neighbors: {len(auto_neighbor_pairs)} pairs")

Auto with neighbors: 18 pairs

Use case: When bins follow a naming convention with increasing redshift, this selects correlations between nearby redshift bins of the same type (e.g., source bins or lens bins).

CrossNameDiffBinPairSelector

Convenience selector for same_name_prefix=False:

from firecrown.metadata_types import CrossNameDiffBinPairSelector

# Equivalent to NameDiffBinPairSelector(same_name_prefix=False, neighbors_diff=[0, 1])
cross_neighbor_selector = CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
cross_neighbor_pairs = make_binned_two_point_filtered(all_bins, cross_neighbor_selector)

print(f"Cross-type with neighbors: {len(cross_neighbor_pairs)} pairs")

Cross-type with neighbors: 9 pairs

Use case: When bins follow a naming convention with increasing redshift, this selects cross-type correlations (e.g., source-lens) with redshift constraints based on index differences.

Display some cross-type neighbor pairs:

Code

import pandas as pd
from IPython.display import Markdown

cross_neighbor_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "index-diff": int(pair.x.bin_name[-1]) - int(pair.y.bin_name[-1]),
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in cross_neighbor_pairs[::5]  # Every 5th pair
]

df = pd.DataFrame(cross_neighbor_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	index-diff	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_lens0	0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens2	1	Galaxies.SHEAR_E	Galaxies.COUNTS

Logical Combinations

Selectors support logical operations to build complex criteria:

AND Operator (`&`)

Keep pairs that satisfy both conditions:

# Auto-correlations that are also source measurements
auto_source_selector = AutoNameBinPairSelector() & SourceBinPairSelector()
auto_source_pairs = make_binned_two_point_filtered(all_bins, auto_source_selector)

print(f"Auto-correlation source pairs: {len(auto_source_pairs)}")

Auto-correlation source pairs: 5

Display auto-correlation source pairs:

Code

import pandas as pd
from IPython.display import Markdown

auto_source_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in auto_source_pairs
]

df = pd.DataFrame(auto_source_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_source0	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source1	lsst_y1_source1	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source2	lsst_y1_source2	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source3	lsst_y1_source3	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source4	lsst_y1_source4	Galaxies.SHEAR_E	Galaxies.SHEAR_E

OR Operator (`|`)

Keep pairs that satisfy either condition:

# Either auto-correlations OR source-lens cross-correlations
mixed_selector = AutoNameBinPairSelector() | SourceLensBinPairSelector()
mixed_pairs = make_binned_two_point_filtered(all_bins, mixed_selector)

print(f"Auto OR source-lens pairs: {len(mixed_pairs)}")

Auto OR source-lens pairs: 35

Display mixed pairs:

Code

import pandas as pd
from IPython.display import Markdown

mixed_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in mixed_pairs[::5]  # Show every 5th pair
]

df = pd.DataFrame(mixed_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_source0	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source0	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source1	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source2	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_lens0	lsst_y1_lens0	Galaxies.COUNTS	Galaxies.COUNTS

NOT Operator (`~`)

Keep pairs that do not satisfy the condition:

# Everything except auto-correlations (i.e., cross-correlations)
not_auto_selector = ~AutoNameBinPairSelector()
not_auto_pairs = make_binned_two_point_filtered(all_bins, not_auto_selector)

print(f"Non-auto (cross) pairs: {len(not_auto_pairs)}")

Non-auto (cross) pairs: 45

Display non-auto pairs:

Code

import pandas as pd
from IPython.display import Markdown

not_auto_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in not_auto_pairs[::5]  # Show every 5th pair
]

df = pd.DataFrame(not_auto_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_source1	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source1	lsst_y1_source3	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source0	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source1	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source2	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_lens0	lsst_y1_lens1	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens1	lsst_y1_lens3	Galaxies.COUNTS	Galaxies.COUNTS

Complex Combinations

Combine multiple operators:

# (Auto-correlations AND source measurements) OR (lens auto-correlations)
complex_selector = (
    (AutoNameBinPairSelector() & SourceBinPairSelector()) |
    (AutoNameBinPairSelector() & LensBinPairSelector())
)
complex_pairs = make_binned_two_point_filtered(all_bins, complex_selector)

print(f"Complex selection: {len(complex_pairs)}")

Complex selection: 10

Display complex pairs:

Code

import pandas as pd
from IPython.display import Markdown

complex_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in complex_pairs[::2]  # Show every 2nd pair
]
df = pd.DataFrame(complex_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_source0	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source2	lsst_y1_source2	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source4	lsst_y1_source4	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_lens1	lsst_y1_lens1	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens3	lsst_y1_lens3	Galaxies.COUNTS	Galaxies.COUNTS

Composite Selectors

Composite selectors are specialized selectors that combine multiple simpler selectors according to specific logic. Unlike using logical operators (&, |, ~) directly, composite selectors implement domain-specific selection patterns as reusable classes.

Understanding Composite Selectors

A CompositeSelector is a base class for selectors that internally manage a list of other selectors. When you use composite selectors, you benefit from:

Encapsulation: Complex logic is packaged into a single, named selector
Reusability: Common patterns (like “auto-correlations”, “3×2pt”) can be used consistently
Clarity: Code intent is clearer with descriptive selector names

Example: The ThreeTwoBinPairSelector

The ThreeTwoBinPairSelector is a composite selector designed for “3×2pt” analyses, which combine three types of two-point correlations:

Cosmic shear (source × source)
Galaxy-galaxy lensing (source × lens)
Galaxy clustering (lens × lens)

This is a standard observable combination in weak lensing surveys like DES, HSC, and LSST.

from firecrown.metadata_types import ThreeTwoBinPairSelector

# Create a 3×2pt selector
three_two_selector = ThreeTwoBinPairSelector(
    source_dist=1, lens_dist=1, source_lens_dist=5
)

# Apply to our bins
three_two_pairs = make_binned_two_point_filtered(all_bins, three_two_selector)

print(f"3×2pt pairs: {len(three_two_pairs)}")

3×2pt pairs: 28

The 3×2pt selector includes:

All cosmic shear auto-correlations and cross-correlations (source × source)
All galaxy-galaxy lensing (source × lens, in both orders)
All galaxy clustering auto-correlations and cross-correlations (lens × lens)

Display 3×2pt pairs:

Code

import pandas as pd
from IPython.display import Markdown

three_two_pairs_table = [
    {
        "bin-x": pair.x.bin_name,
        "bin-y": pair.y.bin_name,
        "measurement-x": str(pair.x_measurement),
        "measurement-y": str(pair.y_measurement),
    }
    for pair in three_two_pairs
]

df = pd.DataFrame(three_two_pairs_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
lsst_y1_source0	lsst_y1_source0	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source0	lsst_y1_source1	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source1	lsst_y1_source1	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source1	lsst_y1_source2	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source2	lsst_y1_source2	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source2	lsst_y1_source3	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source3	lsst_y1_source3	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source3	lsst_y1_source4	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source4	lsst_y1_source4	Galaxies.SHEAR_E	Galaxies.SHEAR_E
lsst_y1_source1	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source2	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source2	lsst_y1_lens1	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens1	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source3	lsst_y1_lens2	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens0	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens1	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens2	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_source4	lsst_y1_lens3	Galaxies.SHEAR_E	Galaxies.COUNTS
lsst_y1_lens0	lsst_y1_lens0	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens0	lsst_y1_lens1	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens1	lsst_y1_lens1	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens1	lsst_y1_lens2	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens2	lsst_y1_lens2	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens2	lsst_y1_lens3	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens3	lsst_y1_lens3	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens3	lsst_y1_lens4	Galaxies.COUNTS	Galaxies.COUNTS
lsst_y1_lens4	lsst_y1_lens4	Galaxies.COUNTS	Galaxies.COUNTS

Creating Custom Composite Selectors

You can create your own composite selectors for domain-specific patterns. Here’s how to implement one:

from typing import Any
from firecrown.metadata_types import CompositeSelector, BinPairSelector
from firecrown.metadata_types import register_bin_pair_selector
from pydantic import Field


@register_bin_pair_selector
class CustomTwoTwoBinPairSelector(CompositeSelector):
    """Select 2×2pt: cosmic shear + galaxy clustering (no galaxy-galaxy lensing)."""

    kind: str = "custom_2x2pt"

    def model_post_init(self, _: Any, /) -> None:
        self._impl = SourceBinPairSelector() | LensBinPairSelector()


# Use the custom selector
two_two_selector = CustomTwoTwoBinPairSelector()
two_two_pairs = make_binned_two_point_filtered(all_bins, two_two_selector)

print(f"Custom 2×2pt pairs (no galaxy-galaxy lensing): {len(two_two_pairs)}")

Custom 2×2pt pairs (no galaxy-galaxy lensing): 30

When to Use Composite Selectors

Use composite selectors when:

You have a well-defined, reusable selection pattern
The pattern combines multiple criteria in a specific way
You want to give the pattern a meaningful name (like “3×2pt”)
You need to serialize and share the pattern across analyses

Use direct logical operators (&, |, ~) when:

You need a one-off combination
The logic is simple and self-explanatory
You’re experimenting with different criteria

Combining Composite Selectors with Logical Operators

Composite selectors can be combined with other selectors using logical operators:

# 3×2pt but remove (lsst_y1_source4, lsst_y1_lens3) pair
three_two_rm = ThreeTwoBinPairSelector() & ~NamedBinPairSelector(
    names=[("lsst_y1_source4", "lsst_y1_lens3")]
)

# Use the combined selector
three_two_rm_pairs = make_binned_two_point_filtered(all_bins, three_two_rm)

print(
    f"3×2pt pairs with (lsst_y1_source4, lsst_y1_lens3) removed: {len(three_two_rm_pairs)}"
)

3×2pt pairs with (lsst_y1_source4, lsst_y1_lens3) removed: 39

This flexibility allows you to start with standard patterns (like 3×2pt) and refine them as needed for your specific analysis.

Advanced Physical Constraints

Beyond the pre-built selectors, you can construct sophisticated physical selections by combining multiple building blocks.

Combining Multiple Requirements

Real analyses often need to combine several constraints:

from firecrown.metadata_types import (
    SourceLensBinPairSelector,
    CrossNameDiffBinPairSelector,
    NamedBinPairSelector,
)

# Galaxy-galaxy lensing with multiple constraints
# (assumes bins follow naming convention where index correlates with redshift)
multi_constrained_gl = (
    SourceLensBinPairSelector()  # Select (Source, Lens) measurement pairs
    & CrossNameDiffBinPairSelector(neighbors_diff=[0, 1, 2])  # Index-based constraint
    & ~NamedBinPairSelector(  # Remove specific problematic pairs
        names=[
            ("lsst_y1_source4", "lsst_y1_lens3"),  # Example: noisy pair
        ]
    )
)

multi_constrained_gl_pairs = make_binned_two_point_filtered(all_bins, multi_constrained_gl)
print(f"Multi-constrained galaxy-galaxy lensing: {len(multi_constrained_gl_pairs)} pairs")

Multi-constrained galaxy-galaxy lensing: 11 pairs

This combines:

Measurement type filtering: Only source-lens pairs
Index-based constraints: Assumes naming convention where indices differ by 0, 1, or 2
Manual exclusions: Remove specific pairs known to be problematic

TypeSourceBinPairSelector: Filtering by Data Provenance

The TypeSourceBinPairSelector filters based on how the data was obtained (e.g., spectroscopic vs photometric redshifts):

from firecrown.metadata_types import TypeSource, TypeSourceBinPairSelector

# Only correlate bins with matching data source
type_source_selector = TypeSourceBinPairSelector(
    type_source=TypeSource.DEFAULT
)

# Note: In our generated bins, all have the same type_source
# so this doesn't filter anything here. In real surveys, you might have:
# - TypeSource.PHOTOMETRIC for photo-z bins
# - TypeSource.SPECTROSCOPIC for spec-z bins
# - Custom TypeSource values for different surveys

type_filtered_pairs = make_binned_two_point_filtered(all_bins, type_source_selector)
print(f"Type-source filtered pairs: {len(type_filtered_pairs)}")

Type-source filtered pairs: 55

Use cases:

Separate analyses for spectroscopic vs photometric samples
Avoid mixing bins from different surveys with incompatible systematics
Ensure consistent redshift calibration within correlations

Building-Block Philosophy in Practice

The power of Firecrown’s selector system comes from composition. Here’s a typical workflow:

Step 1: Start with Measurement Types

# Start with the fundamental building block
step1 = SourceLensBinPairSelector()  # Selects (Source, Lens) measurement pairs
step1_pairs = make_binned_two_point_filtered(all_bins, step1)
print(f"Step 1 (measurement types only): {len(step1_pairs)} pairs")

Step 1 (measurement types only): 25 pairs

Step 2: Add Constraints Based on Your Data

# IF your bins follow a naming convention, add index-based filtering
step2 = (
    SourceLensBinPairSelector()
    & CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
)
step2_pairs = make_binned_two_point_filtered(all_bins, step2)
print(f"Step 2 (+ index constraint): {len(step2_pairs)} pairs")
print(f"  Filtered out {len(step1_pairs) - len(step2_pairs)} pairs")

Step 2 (+ index constraint): 9 pairs
  Filtered out 16 pairs

Step 3: Refine Based on Data Quality

# Remove specific pairs if needed
step3 = (
    SourceLensBinPairSelector()
    & CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
    & ~NamedBinPairSelector(
        names=[
            ("lsst_y1_source4", "lsst_y1_lens4"),  # Example problematic pair
        ]
    )
)
step3_pairs = make_binned_two_point_filtered(all_bins, step3)
print(f"Step 3 (+ manual exclusions): {len(step3_pairs)} pairs")
print(f"  Filtered out {len(step2_pairs) - len(step3_pairs)} additional pairs")

Step 3 (+ manual exclusions): 8 pairs
  Filtered out 1 additional pairs

When to Create a Custom CompositeSelector

If you find yourself using the same complex selector combination repeatedly, consider creating a custom CompositeSelector class:

from firecrown.metadata_types import CompositeSelector, register_bin_pair_selector
from typing import Any

@register_bin_pair_selector
class MyCustomSelector(CompositeSelector):
    """Custom selector for my specific analysis."""
    
    kind: str = "my-custom-selector"
    max_separation: int = 2
    
    def model_post_init(self, _: Any, /) -> None:
        self._impl = (
            SourceLensBinPairSelector()
            & CrossNameDiffBinPairSelector(
                neighbors_diff=list(range(0, self.max_separation + 1))  # [0, max_separation]
            )
        )

This makes your code more maintainable and allows you to serialize/deserialize the selector configuration.

Practical Example: Optimizing Signal-to-Noise

Combine physical constraints with signal-to-noise optimization:

# High-S/N selection: nearby bins only, exclude far-separated pairs
high_sn_3x2pt = (
    ThreeTwoBinPairSelector(
        source_dist=2,   # Very conservative
        lens_dist=2,
        source_lens_dist=1
    )
    # Could add more constraints here
)

high_sn_pairs = make_binned_two_point_filtered(all_bins, high_sn_3x2pt)
print(f"\nHigh-S/N 3×2pt selection: {len(high_sn_pairs)} pairs")
print("Trade-off: Fewer pairs but stronger signals per pair")


High-S/N 3×2pt selection: 28 pairs
Trade-off: Fewer pairs but stronger signals per pair

Physical 3×2pt Selection with Redshift Constraints

The ThreeTwoBinPairSelector implements the standard “3×2pt” cosmological analysis, which combines:

Cosmic shear (source × source): Weak lensing auto-correlations
Galaxy clustering (lens × lens): Galaxy position auto-correlations
Galaxy-galaxy lensing (source × lens): Cross-correlations

Crucially, it uses NameDiffBinPairSelector internally to enforce physical redshift constraints.

Understanding the Distance Parameters

The selector has three parameters controlling which bin pairs are included:

from firecrown.metadata_types import ThreeTwoBinPairSelector

# Default: fairly permissive
default_3x2pt = ThreeTwoBinPairSelector(
    source_dist=5,       # Cosmic shear: source_i - source_j ∈ [-5, 5]
    lens_dist=5,         # Clustering: lens_i - lens_j ∈ [-5, 5]
    source_lens_dist=5   # Galaxy-galaxy lensing: source_i - lens_j ∈ {1,2,3,4,5}
)

default_pairs = make_binned_two_point_filtered(all_bins, default_3x2pt)
print(f"Default 3×2pt: {len(default_pairs)} pairs")

Default 3×2pt: 40 pairs

What each parameter does:

source_dist: For cosmic shear, controls which source bin index pairs are included
- Difference: source_i - source_j ∈ [-source_dist, source_dist]
- Example: source_dist=5 allows differences in {-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5}
- Larger values → more cross-redshift correlations → more data, but weaker signals
lens_dist: For galaxy clustering, same logic for lens bin pairs
- Difference: lens_i - lens_j ∈ [-lens_dist, lens_dist]
source_lens_dist: For galaxy-galaxy lensing, enforces source_index - lens_index ∈ {1, 2, ..., source_lens_dist}
- Does NOT include 0: Excludes same-index pairs to avoid mixing source/lens from same redshift bin
- Only positive differences: Ensures lens is at lower redshift than source
- Physically motivated: Lens must be in front of source

Physical Motivation for Distance Parameters

Why limit redshift separation?

Signal-to-noise decreases for widely separated bins
Computational cost increases with more pairs
Very distant bins may have negligible correlation

Why exclude same-index source-lens pairs?

Standard practice: source and lens samples are typically drawn from the same parent population, just split by different selection criteria
Correlating them at the same redshift would mix the selection effects in complicated ways
Different analyses may handle this differently, but ThreeTwoBinPairSelector follows the conservative standard

Tuning for Your Analysis

# Conservative: Only nearest neighbors (highest S/N)
conservative_3x2pt = ThreeTwoBinPairSelector(
    source_dist=2,       # Only adjacent source bins
    lens_dist=2,         # Only adjacent lens bins
    source_lens_dist=2   # Lens within 1 bin of source
)
conservative_pairs = make_binned_two_point_filtered(all_bins, conservative_3x2pt)

# Aggressive: More cross-correlations (more data, weaker signals)
aggressive_3x2pt = ThreeTwoBinPairSelector(
    source_dist=10,
    lens_dist=10,
    source_lens_dist=8
)
aggressive_pairs = make_binned_two_point_filtered(all_bins, aggressive_3x2pt)

print(f"Conservative 3×2pt: {len(conservative_pairs)} pairs")
print(f"Default 3×2pt: {len(default_pairs)} pairs")
print(f"Aggressive 3×2pt: {len(aggressive_pairs)} pairs")

Conservative 3×2pt: 31 pairs
Default 3×2pt: 40 pairs
Aggressive 3×2pt: 40 pairs

Verifying Physical Consistency

Let’s verify that galaxy-galaxy lensing pairs have lens at lower indices:

Code

import pandas as pd
from IPython.display import Markdown
import re

# Extract galaxy-galaxy lensing pairs
gl_pairs = [
    p for p in default_pairs
    if "source" in p.x.bin_name and "lens" in p.y.bin_name
]

# Parse indices
def get_index(bin_name):
    match = re.search(r'(\d+)$', bin_name)
    return int(match.group(1)) if match else None

gl_table = [
    {
        "source-bin": p.x.bin_name,
        "lens-bin": p.y.bin_name,
        "source-idx": get_index(p.x.bin_name),
        "lens-idx": get_index(p.y.bin_name),
        "diff": get_index(p.x.bin_name) - get_index(p.y.bin_name),
    }
    for p in gl_pairs[:10]  # Show first 10
]

df = pd.DataFrame(gl_table)
print("\nGalaxy-galaxy lensing pairs (source × lens):")
print("Note: diff = source_idx - lens_idx should be positive (lens at lower z)\n")
display(Markdown(df.to_markdown(index=False)))


Galaxy-galaxy lensing pairs (source × lens):
Note: diff = source_idx - lens_idx should be positive (lens at lower z)

source-bin	lens-bin	source-idx	lens-idx	diff
lsst_y1_source1	lsst_y1_lens0	1	0	1
lsst_y1_source2	lsst_y1_lens0	2	0	2
lsst_y1_source2	lsst_y1_lens1	2	1	1
lsst_y1_source3	lsst_y1_lens0	3	0	3
lsst_y1_source3	lsst_y1_lens1	3	1	2
lsst_y1_source3	lsst_y1_lens2	3	2	1
lsst_y1_source4	lsst_y1_lens0	4	0	4
lsst_y1_source4	lsst_y1_lens1	4	1	3
lsst_y1_source4	lsst_y1_lens2	4	2	2
lsst_y1_source4	lsst_y1_lens3	4	3	1

When generating metadata from scratch, apply selectors to control which pairs are created:

import numpy as np
from firecrown.metadata_types import TwoPointHarmonic, AutoBinPairSelector

# Create selector for auto-correlations (name AND measurement)
auto_both_selector = AutoBinPairSelector()

# Filter pairs during generation
auto_both_pairs = make_binned_two_point_filtered(all_bins, auto_both_selector)

# Create harmonic-space metadata
ells = np.unique(np.geomspace(2, 2000, 128).astype(int))
auto_harmonic_metadata = [TwoPointHarmonic(XY=xy, ells=ells) for xy in auto_both_pairs]

print(f"Auto-correlation harmonic metadata: {len(auto_harmonic_metadata)}")

Auto-correlation harmonic metadata: 10

Using Selectors with SACC Extraction

When extracting from SACC files, apply selectors to filter which correlations are loaded:

from firecrown.likelihood.factories import load_sacc_data
from firecrown.metadata_functions import extract_all_real_metadata

# Load SACC file
sacc_data = load_sacc_data("../tests/sacc_data.hdf5")

# Extract only source auto-correlations
source_auto_selector = SourceBinPairSelector() & AutoNameBinPairSelector()
source_auto_metadata = extract_all_real_metadata(
    sacc_data, bin_pair_selector=source_auto_selector
)

print(f"Source auto-correlations from SACC: {len(source_auto_metadata)}")

Source auto-correlations from SACC: 8

Display the source auto-correlations:

Code

import pandas as pd
from IPython.display import Markdown

source_auto_table = [
    {
        "bin-x": real.XY.x.bin_name,
        "bin-y": real.XY.y.bin_name,
        "measurement-x": str(real.XY.x_measurement),
        "measurement-y": str(real.XY.y_measurement),
    }
    for real in source_auto_metadata
]

df = pd.DataFrame(source_auto_table)
Markdown(df.to_markdown(index=False))

bin-x	bin-y	measurement-x	measurement-y
src0	src0	Galaxies.PART_OF_XI_MINUS	Galaxies.PART_OF_XI_MINUS
src1	src1	Galaxies.PART_OF_XI_MINUS	Galaxies.PART_OF_XI_MINUS
src2	src2	Galaxies.PART_OF_XI_MINUS	Galaxies.PART_OF_XI_MINUS
src3	src3	Galaxies.PART_OF_XI_MINUS	Galaxies.PART_OF_XI_MINUS
src0	src0	Galaxies.PART_OF_XI_PLUS	Galaxies.PART_OF_XI_PLUS
src1	src1	Galaxies.PART_OF_XI_PLUS	Galaxies.PART_OF_XI_PLUS
src2	src2	Galaxies.PART_OF_XI_PLUS	Galaxies.PART_OF_XI_PLUS
src3	src3	Galaxies.PART_OF_XI_PLUS	Galaxies.PART_OF_XI_PLUS

Compare with extracting all data:

# Extract everything (no selector)
all_metadata = extract_all_real_metadata(sacc_data)

print(f"All correlations in SACC: {len(all_metadata)}")
print(f"Filtered to source auto: {len(source_auto_metadata)}")

All correlations in SACC: 45
Filtered to source auto: 8

Serialization

Selectors can be serialized to YAML for reuse:

from firecrown.utils import base_model_to_yaml, base_model_from_yaml

# Create a complex selector
selector = (AutoNameBinPairSelector() & SourceBinPairSelector()) | LensBinPairSelector()

# Serialize to YAML
selector_yaml = base_model_to_yaml(selector)
print("Serialized selector:")
print(selector_yaml)

Serialized selector:
kind: or
pair_selectors:
- kind: and
  pair_selectors:
  - {kind: auto-name}
  - {kind: source}
- {kind: lens}

Load from YAML:

from firecrown.metadata_types import BinPairSelector

# Deserialize
loaded_selector = base_model_from_yaml(BinPairSelector, selector_yaml)

# Use the loaded selector
loaded_pairs = make_binned_two_point_filtered(all_bins, loaded_selector)
print(f"Pairs from loaded selector: {len(loaded_pairs)}")

Pairs from loaded selector: 20

Summary

Bin pair selectors provide a declarative way to control which bin pairs are included in your analysis:

Atomic selectors: Auto, cross, source, lens, named pairs, index-based constraints
Physical relevance: Index-based selectors enforce redshift ordering conventions
Building blocks: Combine atomic selectors for domain-specific physics
Logical operators: AND (&), OR (|), NOT (~)
Integration: Works with both generators and SACC extraction
Serialization: Save and load selector configurations

Key Selectors for Physical Constraints

All (Source, Lens) pairs

Selector: SourceLensBinPairSelector
Notes: Fundamental building block

Galaxy-galaxy lensing (with naming convention)

Selector: SourceLensBinPairSelector () & CrossNameDiffBinPairSelector (neighbors_diff=[0, 1])
Notes: Assumes index ∝ redshift

Nearby bins (with naming convention)

Selector: AutoNameDiffBinPairSelector (neighbors_diff=[0, 1, -1])
Notes: Limits index differences

Same data source

Selector: TypeSourceBinPairSelector (type_source=…)
Notes: Filter by provenance

Standard 3×2pt (with naming convention)

Selector: ThreeTwoBinPairSelector ()
Notes: Pre-built composite

Custom high-S/N

Selector: ThreeTwoBinPairSelector (source_dist=2, lens_dist=2, source_lens_dist=1)
Notes: Conservative distances

Key Functions

Function	Purpose	Selector Parameter
`make_binned_two_point_filtered`	Filter generated bins	Required
`filter_two_point_combinations`	Filter existing combinations	Required
`extract_all_photoz_bin_combinations`	Filter SACC bin combinations	Optional
`extract_all_harmonic_metadata`	Filter SACC harmonic data	Optional
`extract_all_real_metadata`	Filter SACC real-space data	Optional

Critical Takeaways

Key Principles

Start with measurement types: SourceLensBinPairSelector selects (Source, Lens) pairs
Add constraints as appropriate: Use naming conventions, manual lists, or (in the future) dndz inspection
Building blocks enable flexibility: Compose atomic selectors for your specific requirements
Different approaches are valid: Index-based filtering is convenient but not the only way

Best Practices

Start simple, add complexity: Begin with measurement-type selectors, then add constraints
Understand your data: Know whether your bins follow naming conventions or need other filtering
Verify your selection: Check that the resulting pairs make sense for your analysis
Document your choices: Be explicit about assumptions (e.g., “assumes index ∝ redshift”)
Create custom composites: If you reuse a pattern, make it a CompositeSelector

Next Steps

Now that you understand bin pair selectors:

Generators: Apply selectors when generating metadata
Loading SACC Data: Apply selectors when extracting from SACC
Workflow Guide: See selectors in the complete analysis workflow
Factory Basics: Construct TwoPoint objects from filtered data

Purpose of this Document

Why Use Bin Pair Selectors?

Basic Concepts

The BinPairSelector.keep Method

Physical Relevance in Bin Selection

Galaxy-Galaxy Lensing: A Key Example

The Building Block Approach

Example: Combining Selectors for Redshift Constraints

Common Selector Types

Auto-Correlation Selectors

Cross-Correlation Selectors

Measurement Type Selectors

Named Pair Selectors

Index-Based Selectors

Understanding Bin Name Structure

NameDiffBinPairSelector: Index-Based Filtering

AutoNameDiffBinPairSelector

CrossNameDiffBinPairSelector

Logical Combinations

AND Operator (&)

OR Operator (|)

NOT Operator (~)

Complex Combinations

Composite Selectors

Understanding Composite Selectors

Example: The ThreeTwoBinPairSelector

Creating Custom Composite Selectors

When to Use Composite Selectors

Combining Composite Selectors with Logical Operators

Advanced Physical Constraints

Combining Multiple Requirements

TypeSourceBinPairSelector: Filtering by Data Provenance

Building-Block Philosophy in Practice

Step 1: Start with Measurement Types

Step 2: Add Constraints Based on Your Data

Step 3: Refine Based on Data Quality

Practical Example: Optimizing Signal-to-Noise

Physical 3×2pt Selection with Redshift Constraints

Understanding the Distance Parameters

Tuning for Your Analysis

Verifying Physical Consistency

Using Selectors with SACC Extraction

Serialization

Summary

Key Selectors for Physical Constraints

Key Functions

Critical Takeaways

Next Steps

The `BinPairSelector.keep` Method

AND Operator (`&`)

OR Operator (`|`)

NOT Operator (`~`)