def doc_theme():
return theme_minimal() + theme(
panel_grid_minor=element_line(color="gray", linetype="--"),
)Bin Pair Selectors: Filtering Two-Point Correlations
Version ?env:FIRECROWN_VERSION
Purpose of this Document
This tutorial explains how to use BinPairSelector objects to control which pairs of tomographic bins are included in two-point analyses. Selectors provide a powerful way to filter correlations based on bin names, measurement types, or custom criteria.
Why Use Bin Pair Selectors?
When working with two-point statistics, you often need to include only specific subsets of all possible bin pair combinations:
- Auto-correlations only: Same bin correlated with itself
- Cross-correlations only: Different bins correlated together
- Source measurements only: Weak lensing (shear) correlations
- Lens measurements only: Galaxy counts correlations
- Specific bin pairs: Explicitly named combinations
- Physical consistency: Avoid unphysical correlations (e.g., lens behind source)
- Redshift constraints: Limit correlations based on tomographic bin separation
- Custom criteria: Combine multiple conditions with logical operators
Bin pair selectors let you express these requirements declaratively, making your code clearer and more maintainable.
Basic Concepts
A BinPairSelector is applied when creating TwoPointXY combinations from InferredGalaxyZDist bins. It determines whether to keep or discard each potential pair.
The BinPairSelector.keep Method
Every selector implements a BinPairSelector.keep method that takes:
zdist: A pair ofInferredGalaxyZDistobjects(bin1, bin2)m: A pair ofMeasurementtypes(measurement1, measurement2)
The method returns True to keep the pair, False to discard it.
Physical Relevance in Bin Selection
When selecting bin pairs for two-point analyses, you may need to ensure that the selected pairs are physically meaningful for your analysis. Not all mathematically possible bin combinations may correspond to useful correlations.
Galaxy-Galaxy Lensing: A Key Example
Consider galaxy-galaxy lensing, which measures the correlation between:
- Source galaxies (background): Whose shapes are distorted by lensing
- Lens galaxies (foreground): Whose gravitational field causes the lensing
For lensing to occur, the lens must be in front of the source (at lower or comparable redshift). How you ensure this depends on your data and naming conventions.
The Building Block Approach
Firecrown’s SourceLensBinPairSelector is a fundamental building block that selects all (Source, Lens) measurement pairs. By itself, it doesn’t enforce any redshift ordering, it simply selects based on measurement types.
If your bins follow a naming convention where indices correspond to increasing redshift (e.g., bin0 < bin1 < bin2 in redshift), you can combine SourceLensBinPairSelector with NameDiffBinPairSelector to select only pairs where the lens could plausibly be in front of the source.
Other approaches are possible: In the future, Firecrown may include selectors that directly inspect the dndz distributions of bins to determine redshift ordering, providing an alternative to naming-convention-based filtering.
Many surveys use a convention where tomographic bins are numbered in order of increasing redshift:
lsst_y1_source0contains sources at the lowest redshiftlsst_y1_source4contains sources at the highest redshift- Similarly:
lsst_y1_lens0<lsst_y1_lens1< … in redshift
If your bins follow this convention, NameDiffBinPairSelector provides a convenient way to filter by redshift proximity. If not, you may need different selectors or manual filtering.
Example: Combining Selectors for Redshift Constraints
Here’s how you can add redshift constraints if your bins follow the naming convention:
from firecrown.metadata_types import (
SourceLensBinPairSelector,
NameDiffBinPairSelector,
)
from firecrown.generators import (
LSST_Y1_LENS_HARMONIC_BIN_COLLECTION,
LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION,
)
from firecrown.metadata_functions import (
make_binned_two_point_filtered,
filter_two_point_combinations,
make_all_photoz_bin_combinations,
)
# Generate bins (these follow the naming convention: higher index = higher redshift)
count_bins = LSST_Y1_LENS_HARMONIC_BIN_COLLECTION.generate()
shear_bins = LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION.generate()
all_bins = count_bins + shear_bins
# Basic approach: ALL source-lens pairs (measurement types only)
basic_selector = SourceLensBinPairSelector()
basic_pairs = make_binned_two_point_filtered(all_bins, basic_selector)
# Redshift-constrained approach: Add index-based filtering
# This works because LSST bins follow the naming convention (index ∝ redshift)
# neighbors_diff=[0, 1] means: source_index - lens_index ∈ {0, 1}
# i.e., lens at same or slightly lower redshift
redshift_constrained_selector = SourceLensBinPairSelector() & NameDiffBinPairSelector(
same_name_prefix=False, # Different prefixes (source vs lens)
neighbors_diff=[0, 1], # lens_index ∈ {source_index, source_index - 1}
)
redshift_constrained_pairs = make_binned_two_point_filtered(all_bins, redshift_constrained_selector)
print(f"Basic selection (measurement types): {len(basic_pairs)} pairs")
print(f"Redshift-constrained selection: {len(redshift_constrained_pairs)} pairs")
print(f"Filtered out {len(basic_pairs) - len(redshift_constrained_pairs)} pairs based on index")Basic selection (measurement types): 25 pairs
Redshift-constrained selection: 9 pairs
Filtered out 16 pairs based on index
Here are the pairs kept by the redshift-constrained selector:
Code
import pandas as pd
from IPython.display import Markdown
# Show sample of redshift-constrained pairs
pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in redshift_constrained_pairs
]
df = pd.DataFrame(pairs_table)
display(Markdown(df.to_markdown(index=False)))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source1 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source1 | lsst_y1_lens1 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source2 | lsst_y1_lens1 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source2 | lsst_y1_lens2 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens2 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens3 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens3 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens4 | Galaxies.SHEAR_E | Galaxies.COUNTS |
The selector configuration:
print(f"Selector: {redshift_constrained_selector}")
print(f"\nComponents:")
print(f" - SourceLensBinPairSelector: selects (Source, Lens) measurement pairs")
print(f" - NameDiffBinPairSelector:")
print(f" same_name_prefix=False (different bin types allowed)")
print(f" neighbors_diff=[0, 1] (source_index - lens_index ∈ {{0, 1}})")Selector: kind='and' pair_selectors=[SourceLensBinPairSelector(kind='source-lens'), NameDiffBinPairSelector(kind='name-diff', same_name_prefix=False, neighbors_diff=[0, 1])]
Components:
- SourceLensBinPairSelector: selects (Source, Lens) measurement pairs
- NameDiffBinPairSelector:
same_name_prefix=False (different bin types allowed)
neighbors_diff=[0, 1] (source_index - lens_index ∈ {0, 1})
Firecrown provides atomic selector building blocks rather than prescribing one “correct” way to select bin pairs. This flexibility is essential because:
- Different analyses have different requirements for which pairs to include
- Physical constraints can be encoded in different ways (naming conventions,
dndzinspection, manual lists, etc.) - Redshift overlap between bins varies by survey
- Some analyses may intentionally include broader selections
SourceLensBinPairSelector is the fundamental building block, it selects (Source, Lens) measurement pairs. You can then combine it with other selectors to add constraints appropriate to your data:
- If using naming conventions: Combine with
NameDiffBinPairSelector - If listing pairs explicitly: Combine with
NamedBinPairSelector - Future possibilities: Selectors that inspect
dndzdistributions directly
When a selector pattern proves broadly useful (like 3×2pt with the naming convention), it’s added as a pre-built CompositeSelector.
Common Selector Types
Below are some common selectors you might use in your code. Note that we show only a fraction of their output for brevity.
Auto-Correlation Selectors
Select only pairs where both bins are the same:
from firecrown.metadata_types import AutoNameBinPairSelector
from firecrown.generators import (
LSST_Y1_LENS_HARMONIC_BIN_COLLECTION,
LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION,
)
from firecrown.metadata_functions import make_binned_two_point_filtered
# Generate LSST Y1 bins
count_bins = LSST_Y1_LENS_HARMONIC_BIN_COLLECTION.generate()
shear_bins = LSST_Y1_SOURCE_HARMONIC_BIN_COLLECTION.generate()
all_bins = count_bins + shear_bins
# Select only auto-correlations (same bin name)
auto_name_selector = AutoNameBinPairSelector()
auto_name_pairs = make_binned_two_point_filtered(all_bins, auto_name_selector)
print(f"Total bins: {len(all_bins)}")
print(f"Auto-correlation pairs (by name): {len(auto_name_pairs)}")Total bins: 10
Auto-correlation pairs (by name): 10
If you also want matching measurement types only, use AutoMeasurementBinPairSelector. Note that this selector requires only the measurement types to match, not the bin names:
from firecrown.metadata_types import AutoMeasurementBinPairSelector
# Select pairs with same measurement type
auto_measurement_selector = AutoMeasurementBinPairSelector()
auto_measurement_pairs = make_binned_two_point_filtered(
all_bins, auto_measurement_selector
)
print(f"Auto-correlation pairs (by measurement): {len(auto_measurement_pairs)}")Auto-correlation pairs (by measurement): 30
Display the auto-correlation pairs:
Code
import pandas as pd
from IPython.display import Markdown
auto_measurement_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in auto_measurement_pairs[::3] # Show every 3rd pair
]
df = pd.DataFrame(auto_measurement_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_source0 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source0 | lsst_y1_source3 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source1 | lsst_y1_source2 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source2 | lsst_y1_source2 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source3 | lsst_y1_source3 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_lens0 | lsst_y1_lens0 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens0 | lsst_y1_lens3 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens1 | lsst_y1_lens2 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens2 | lsst_y1_lens2 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens3 | lsst_y1_lens3 | Galaxies.COUNTS | Galaxies.COUNTS |
Cross-Correlation Selectors
Select only pairs where bins are different:
from firecrown.metadata_types import CrossNameBinPairSelector
# Select only cross-correlations (different bin names)
cross_selector = CrossNameBinPairSelector()
cross_pairs = make_binned_two_point_filtered(all_bins, cross_selector)
print(f"Cross-correlation pairs: {len(cross_pairs)}")Cross-correlation pairs: 45
Display the cross-correlation pairs:
Code
import pandas as pd
from IPython.display import Markdown
cross_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in cross_pairs[::5] # Show every 5th pair
]
df = pd.DataFrame(cross_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_source1 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source1 | lsst_y1_source3 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source0 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source1 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source2 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_lens0 | lsst_y1_lens1 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens1 | lsst_y1_lens3 | Galaxies.COUNTS | Galaxies.COUNTS |
Measurement Type Selectors
Select based on the type of measurement:
from firecrown.metadata_types import SourceBinPairSelector, LensBinPairSelector
# Source measurements (weak lensing shear)
source_selector = SourceBinPairSelector()
source_pairs = make_binned_two_point_filtered(all_bins, source_selector)
# Lens measurements (galaxy counts)
lens_selector = LensBinPairSelector()
lens_pairs = make_binned_two_point_filtered(all_bins, lens_selector)
print(f"Source (shear) pairs: {len(source_pairs)}")
print(f"Lens (counts) pairs: {len(lens_pairs)}")Source (shear) pairs: 15
Lens (counts) pairs: 15
Source-lens cross-correlations:
from firecrown.metadata_types import SourceLensBinPairSelector
# Source × Lens cross-correlations
source_lens_selector = SourceLensBinPairSelector()
source_lens_pairs = make_binned_two_point_filtered(all_bins, source_lens_selector)
print(f"Source × Lens pairs: {len(source_lens_pairs)}")Source × Lens pairs: 25
Display source-lens pairs:
Code
import pandas as pd
from IPython.display import Markdown
source_lens_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in source_lens_pairs[::5] # Show every 5th pair
]
df = pd.DataFrame(source_lens_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source1 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source2 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
Named Pair Selectors
Select specific bin name combinations explicitly:
from firecrown.metadata_types import NamedBinPairSelector
# Select specific pairs by name
named_selector = NamedBinPairSelector(
names=[
("lsst_y1_lens0", "lsst_y1_lens1"),
("lsst_y1_source0", "lsst_y1_source0"),
("lsst_y1_source0", "lsst_y1_lens0"),
]
)
named_pairs = make_binned_two_point_filtered(all_bins, named_selector)
print(f"Named pairs: {len(named_pairs)}")Named pairs: 3
Note: Matching is order-dependent. ("bin0", "bin1") is different from ("bin1", "bin0"). Include both if you want symmetric matching.
Display named pairs:
Code
import pandas as pd
from IPython.display import Markdown
named_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in named_pairs
]
df = pd.DataFrame(named_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_source0 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source0 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_lens0 | lsst_y1_lens1 | Galaxies.COUNTS | Galaxies.COUNTS |
Index-Based Selectors
Index-based selectors filter pairs based on the numeric indices in bin names and their text prefixes. These are relevant for implementing physical constraints based on the redshift ordering convention.
Understanding Bin Name Structure
Bin names follow the pattern <prefix><number>:
lsst_y1_source0→ prefix:lsst_y1_source, index:0lsst_y1_lens3→ prefix:lsst_y1_lens, index:3
NameDiffBinPairSelector: Index-Based Filtering
This selector is useful when your bins follow a naming convention where the numeric index correlates with some property (typically redshift). It filters based on:
- Whether bin prefixes match (
same_name_prefix) - The difference between bin indices (
neighbors_diff)
The index difference is computed as: left_index - right_index
from firecrown.metadata_types import NameDiffBinPairSelector
# Example 1: Same prefix, adjacent or identical bins
# Keeps bins differing by 0, ±1 in index
same_prefix_selector = NameDiffBinPairSelector(
same_name_prefix=True,
neighbors_diff=[0, 1, -1]
)
# Example 2: Different prefixes, lens at lower/equal redshift
# For galaxy-galaxy lensing: source_index - lens_index ∈ {0, 1}
diff_prefix_selector = NameDiffBinPairSelector(
same_name_prefix=False,
neighbors_diff=[0, 1]
)
# Apply to our bins
same_prefix_pairs = make_binned_two_point_filtered(all_bins, same_prefix_selector)
diff_prefix_pairs = make_binned_two_point_filtered(all_bins, diff_prefix_selector)
print(f"Same prefix, nearby indices: {len(same_prefix_pairs)} pairs")
print(f"Different prefix, controlled indices: {len(diff_prefix_pairs)} pairs")Same prefix, nearby indices: 18 pairs
Different prefix, controlled indices: 9 pairs
Key behaviors:
| Configuration | Keeps | Rejects | Use Case |
|---|---|---|---|
same_name_prefix=Trueneighbors_diff=1 |
(bin0, bin1) (bin1, bin0) |
(bin0, bin2) (bin2, bin2) (src0, bin0) |
Adjacent bins, same type |
same_name_prefix=Falseneighbors_diff=[0, 1] |
(src0, lens0) (src1, lens0) (src1, lens1) |
(src0, lens2) (lens1, src0) |
Cross-type, redshift constraint |
same_name_prefix=Trueneighbors_diff=0 |
(bin0, bin0) (bin1, bin1) |
(bin0, bin1) | Exact auto-correlations |
AutoNameDiffBinPairSelector
Convenience selector for same_name_prefix=True:
from firecrown.metadata_types import AutoNameDiffBinPairSelector
# Equivalent to NameDiffBinPairSelector(same_name_prefix=True, neighbors_diff=[0, 1, -1])
auto_neighbor_selector = AutoNameDiffBinPairSelector(neighbors_diff=[0, 1, -1])
auto_neighbor_pairs = make_binned_two_point_filtered(all_bins, auto_neighbor_selector)
print(f"Auto with neighbors: {len(auto_neighbor_pairs)} pairs")Auto with neighbors: 18 pairs
Use case: When bins follow a naming convention with increasing redshift, this selects correlations between nearby redshift bins of the same type (e.g., source bins or lens bins).
CrossNameDiffBinPairSelector
Convenience selector for same_name_prefix=False:
from firecrown.metadata_types import CrossNameDiffBinPairSelector
# Equivalent to NameDiffBinPairSelector(same_name_prefix=False, neighbors_diff=[0, 1])
cross_neighbor_selector = CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
cross_neighbor_pairs = make_binned_two_point_filtered(all_bins, cross_neighbor_selector)
print(f"Cross-type with neighbors: {len(cross_neighbor_pairs)} pairs")Cross-type with neighbors: 9 pairs
Use case: When bins follow a naming convention with increasing redshift, this selects cross-type correlations (e.g., source-lens) with redshift constraints based on index differences.
Display some cross-type neighbor pairs:
Code
import pandas as pd
from IPython.display import Markdown
cross_neighbor_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"index-diff": int(pair.x.bin_name[-1]) - int(pair.y.bin_name[-1]),
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in cross_neighbor_pairs[::5] # Every 5th pair
]
df = pd.DataFrame(cross_neighbor_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | index-diff | measurement-x | measurement-y |
|---|---|---|---|---|
| lsst_y1_source0 | lsst_y1_lens0 | 0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens2 | 1 | Galaxies.SHEAR_E | Galaxies.COUNTS |
Logical Combinations
Selectors support logical operations to build complex criteria:
AND Operator (&)
Keep pairs that satisfy both conditions:
# Auto-correlations that are also source measurements
auto_source_selector = AutoNameBinPairSelector() & SourceBinPairSelector()
auto_source_pairs = make_binned_two_point_filtered(all_bins, auto_source_selector)
print(f"Auto-correlation source pairs: {len(auto_source_pairs)}")Auto-correlation source pairs: 5
Display auto-correlation source pairs:
Code
import pandas as pd
from IPython.display import Markdown
auto_source_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in auto_source_pairs
]
df = pd.DataFrame(auto_source_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_source0 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source1 | lsst_y1_source1 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source2 | lsst_y1_source2 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source3 | lsst_y1_source3 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source4 | lsst_y1_source4 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
OR Operator (|)
Keep pairs that satisfy either condition:
# Either auto-correlations OR source-lens cross-correlations
mixed_selector = AutoNameBinPairSelector() | SourceLensBinPairSelector()
mixed_pairs = make_binned_two_point_filtered(all_bins, mixed_selector)
print(f"Auto OR source-lens pairs: {len(mixed_pairs)}")Auto OR source-lens pairs: 35
Display mixed pairs:
Code
import pandas as pd
from IPython.display import Markdown
mixed_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in mixed_pairs[::5] # Show every 5th pair
]
df = pd.DataFrame(mixed_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_source0 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source0 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source1 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source2 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_lens0 | lsst_y1_lens0 | Galaxies.COUNTS | Galaxies.COUNTS |
NOT Operator (~)
Keep pairs that do not satisfy the condition:
# Everything except auto-correlations (i.e., cross-correlations)
not_auto_selector = ~AutoNameBinPairSelector()
not_auto_pairs = make_binned_two_point_filtered(all_bins, not_auto_selector)
print(f"Non-auto (cross) pairs: {len(not_auto_pairs)}")Non-auto (cross) pairs: 45
Display non-auto pairs:
Code
import pandas as pd
from IPython.display import Markdown
not_auto_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in not_auto_pairs[::5] # Show every 5th pair
]
df = pd.DataFrame(not_auto_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_source1 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source1 | lsst_y1_source3 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source0 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source1 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source2 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_lens0 | lsst_y1_lens1 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens1 | lsst_y1_lens3 | Galaxies.COUNTS | Galaxies.COUNTS |
Complex Combinations
Combine multiple operators:
# (Auto-correlations AND source measurements) OR (lens auto-correlations)
complex_selector = (
(AutoNameBinPairSelector() & SourceBinPairSelector()) |
(AutoNameBinPairSelector() & LensBinPairSelector())
)
complex_pairs = make_binned_two_point_filtered(all_bins, complex_selector)
print(f"Complex selection: {len(complex_pairs)}")Complex selection: 10
Display complex pairs:
Code
import pandas as pd
from IPython.display import Markdown
complex_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in complex_pairs[::2] # Show every 2nd pair
]
df = pd.DataFrame(complex_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_source0 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source2 | lsst_y1_source2 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source4 | lsst_y1_source4 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_lens1 | lsst_y1_lens1 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens3 | lsst_y1_lens3 | Galaxies.COUNTS | Galaxies.COUNTS |
Composite Selectors
Composite selectors are specialized selectors that combine multiple simpler selectors according to specific logic. Unlike using logical operators (&, |, ~) directly, composite selectors implement domain-specific selection patterns as reusable classes.
Understanding Composite Selectors
A CompositeSelector is a base class for selectors that internally manage a list of other selectors. When you use composite selectors, you benefit from:
- Encapsulation: Complex logic is packaged into a single, named selector
- Reusability: Common patterns (like “auto-correlations”, “3×2pt”) can be used consistently
- Clarity: Code intent is clearer with descriptive selector names
Example: The ThreeTwoBinPairSelector
The ThreeTwoBinPairSelector is a composite selector designed for “3×2pt” analyses, which combine three types of two-point correlations:
- Cosmic shear (source × source)
- Galaxy-galaxy lensing (source × lens)
- Galaxy clustering (lens × lens)
This is a standard observable combination in weak lensing surveys like DES, HSC, and LSST.
from firecrown.metadata_types import ThreeTwoBinPairSelector
# Create a 3×2pt selector
three_two_selector = ThreeTwoBinPairSelector(
source_dist=1, lens_dist=1, source_lens_dist=5
)
# Apply to our bins
three_two_pairs = make_binned_two_point_filtered(all_bins, three_two_selector)
print(f"3×2pt pairs: {len(three_two_pairs)}")3×2pt pairs: 28
The 3×2pt selector includes:
- All cosmic shear auto-correlations and cross-correlations (source × source)
- All galaxy-galaxy lensing (source × lens, in both orders)
- All galaxy clustering auto-correlations and cross-correlations (lens × lens)
Display 3×2pt pairs:
Code
import pandas as pd
from IPython.display import Markdown
three_two_pairs_table = [
{
"bin-x": pair.x.bin_name,
"bin-y": pair.y.bin_name,
"measurement-x": str(pair.x_measurement),
"measurement-y": str(pair.y_measurement),
}
for pair in three_two_pairs
]
df = pd.DataFrame(three_two_pairs_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| lsst_y1_source0 | lsst_y1_source0 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source0 | lsst_y1_source1 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source1 | lsst_y1_source1 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source1 | lsst_y1_source2 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source2 | lsst_y1_source2 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source2 | lsst_y1_source3 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source3 | lsst_y1_source3 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source3 | lsst_y1_source4 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source4 | lsst_y1_source4 | Galaxies.SHEAR_E | Galaxies.SHEAR_E |
| lsst_y1_source1 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source2 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source2 | lsst_y1_lens1 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens1 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source3 | lsst_y1_lens2 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens0 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens1 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens2 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_source4 | lsst_y1_lens3 | Galaxies.SHEAR_E | Galaxies.COUNTS |
| lsst_y1_lens0 | lsst_y1_lens0 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens0 | lsst_y1_lens1 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens1 | lsst_y1_lens1 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens1 | lsst_y1_lens2 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens2 | lsst_y1_lens2 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens2 | lsst_y1_lens3 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens3 | lsst_y1_lens3 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens3 | lsst_y1_lens4 | Galaxies.COUNTS | Galaxies.COUNTS |
| lsst_y1_lens4 | lsst_y1_lens4 | Galaxies.COUNTS | Galaxies.COUNTS |
Creating Custom Composite Selectors
You can create your own composite selectors for domain-specific patterns. Here’s how to implement one:
from typing import Any
from firecrown.metadata_types import CompositeSelector, BinPairSelector
from firecrown.metadata_types import register_bin_pair_selector
from pydantic import Field
@register_bin_pair_selector
class CustomTwoTwoBinPairSelector(CompositeSelector):
"""Select 2×2pt: cosmic shear + galaxy clustering (no galaxy-galaxy lensing)."""
kind: str = "custom_2x2pt"
def model_post_init(self, _: Any, /) -> None:
self._impl = SourceBinPairSelector() | LensBinPairSelector()
# Use the custom selector
two_two_selector = CustomTwoTwoBinPairSelector()
two_two_pairs = make_binned_two_point_filtered(all_bins, two_two_selector)
print(f"Custom 2×2pt pairs (no galaxy-galaxy lensing): {len(two_two_pairs)}")Custom 2×2pt pairs (no galaxy-galaxy lensing): 30
When to Use Composite Selectors
Use composite selectors when:
- You have a well-defined, reusable selection pattern
- The pattern combines multiple criteria in a specific way
- You want to give the pattern a meaningful name (like “3×2pt”)
- You need to serialize and share the pattern across analyses
Use direct logical operators (&, |, ~) when:
- You need a one-off combination
- The logic is simple and self-explanatory
- You’re experimenting with different criteria
Combining Composite Selectors with Logical Operators
Composite selectors can be combined with other selectors using logical operators:
# 3×2pt but remove (lsst_y1_source4, lsst_y1_lens3) pair
three_two_rm = ThreeTwoBinPairSelector() & ~NamedBinPairSelector(
names=[("lsst_y1_source4", "lsst_y1_lens3")]
)
# Use the combined selector
three_two_rm_pairs = make_binned_two_point_filtered(all_bins, three_two_rm)
print(
f"3×2pt pairs with (lsst_y1_source4, lsst_y1_lens3) removed: {len(three_two_rm_pairs)}"
)3×2pt pairs with (lsst_y1_source4, lsst_y1_lens3) removed: 39
This flexibility allows you to start with standard patterns (like 3×2pt) and refine them as needed for your specific analysis.
Advanced Physical Constraints
Beyond the pre-built selectors, you can construct sophisticated physical selections by combining multiple building blocks.
Combining Multiple Requirements
Real analyses often need to combine several constraints:
from firecrown.metadata_types import (
SourceLensBinPairSelector,
CrossNameDiffBinPairSelector,
NamedBinPairSelector,
)
# Galaxy-galaxy lensing with multiple constraints
# (assumes bins follow naming convention where index correlates with redshift)
multi_constrained_gl = (
SourceLensBinPairSelector() # Select (Source, Lens) measurement pairs
& CrossNameDiffBinPairSelector(neighbors_diff=[0, 1, 2]) # Index-based constraint
& ~NamedBinPairSelector( # Remove specific problematic pairs
names=[
("lsst_y1_source4", "lsst_y1_lens3"), # Example: noisy pair
]
)
)
multi_constrained_gl_pairs = make_binned_two_point_filtered(all_bins, multi_constrained_gl)
print(f"Multi-constrained galaxy-galaxy lensing: {len(multi_constrained_gl_pairs)} pairs")Multi-constrained galaxy-galaxy lensing: 11 pairs
This combines:
- Measurement type filtering: Only source-lens pairs
- Index-based constraints: Assumes naming convention where indices differ by 0, 1, or 2
- Manual exclusions: Remove specific pairs known to be problematic
TypeSourceBinPairSelector: Filtering by Data Provenance
The TypeSourceBinPairSelector filters based on how the data was obtained (e.g., spectroscopic vs photometric redshifts):
from firecrown.metadata_types import TypeSource, TypeSourceBinPairSelector
# Only correlate bins with matching data source
type_source_selector = TypeSourceBinPairSelector(
type_source=TypeSource.DEFAULT
)
# Note: In our generated bins, all have the same type_source
# so this doesn't filter anything here. In real surveys, you might have:
# - TypeSource.PHOTOMETRIC for photo-z bins
# - TypeSource.SPECTROSCOPIC for spec-z bins
# - Custom TypeSource values for different surveys
type_filtered_pairs = make_binned_two_point_filtered(all_bins, type_source_selector)
print(f"Type-source filtered pairs: {len(type_filtered_pairs)}")Type-source filtered pairs: 55
Use cases:
- Separate analyses for spectroscopic vs photometric samples
- Avoid mixing bins from different surveys with incompatible systematics
- Ensure consistent redshift calibration within correlations
Building-Block Philosophy in Practice
The power of Firecrown’s selector system comes from composition. Here’s a typical workflow:
Step 1: Start with Measurement Types
# Start with the fundamental building block
step1 = SourceLensBinPairSelector() # Selects (Source, Lens) measurement pairs
step1_pairs = make_binned_two_point_filtered(all_bins, step1)
print(f"Step 1 (measurement types only): {len(step1_pairs)} pairs")Step 1 (measurement types only): 25 pairs
Step 2: Add Constraints Based on Your Data
# IF your bins follow a naming convention, add index-based filtering
step2 = (
SourceLensBinPairSelector()
& CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
)
step2_pairs = make_binned_two_point_filtered(all_bins, step2)
print(f"Step 2 (+ index constraint): {len(step2_pairs)} pairs")
print(f" Filtered out {len(step1_pairs) - len(step2_pairs)} pairs")Step 2 (+ index constraint): 9 pairs
Filtered out 16 pairs
Step 3: Refine Based on Data Quality
# Remove specific pairs if needed
step3 = (
SourceLensBinPairSelector()
& CrossNameDiffBinPairSelector(neighbors_diff=[0, 1])
& ~NamedBinPairSelector(
names=[
("lsst_y1_source4", "lsst_y1_lens4"), # Example problematic pair
]
)
)
step3_pairs = make_binned_two_point_filtered(all_bins, step3)
print(f"Step 3 (+ manual exclusions): {len(step3_pairs)} pairs")
print(f" Filtered out {len(step2_pairs) - len(step3_pairs)} additional pairs")Step 3 (+ manual exclusions): 8 pairs
Filtered out 1 additional pairs
If you find yourself using the same complex selector combination repeatedly, consider creating a custom CompositeSelector class:
from firecrown.metadata_types import CompositeSelector, register_bin_pair_selector
from typing import Any
@register_bin_pair_selector
class MyCustomSelector(CompositeSelector):
"""Custom selector for my specific analysis."""
kind: str = "my-custom-selector"
max_separation: int = 2
def model_post_init(self, _: Any, /) -> None:
self._impl = (
SourceLensBinPairSelector()
& CrossNameDiffBinPairSelector(
neighbors_diff=list(range(0, self.max_separation + 1)) # [0, max_separation]
)
)This makes your code more maintainable and allows you to serialize/deserialize the selector configuration.
Practical Example: Optimizing Signal-to-Noise
Combine physical constraints with signal-to-noise optimization:
# High-S/N selection: nearby bins only, exclude far-separated pairs
high_sn_3x2pt = (
ThreeTwoBinPairSelector(
source_dist=2, # Very conservative
lens_dist=2,
source_lens_dist=1
)
# Could add more constraints here
)
high_sn_pairs = make_binned_two_point_filtered(all_bins, high_sn_3x2pt)
print(f"\nHigh-S/N 3×2pt selection: {len(high_sn_pairs)} pairs")
print("Trade-off: Fewer pairs but stronger signals per pair")
High-S/N 3×2pt selection: 28 pairs
Trade-off: Fewer pairs but stronger signals per pair
Physical 3×2pt Selection with Redshift Constraints
The ThreeTwoBinPairSelector implements the standard “3×2pt” cosmological analysis, which combines:
- Cosmic shear (source × source): Weak lensing auto-correlations
- Galaxy clustering (lens × lens): Galaxy position auto-correlations
- Galaxy-galaxy lensing (source × lens): Cross-correlations
Crucially, it uses NameDiffBinPairSelector internally to enforce physical redshift constraints.
Understanding the Distance Parameters
The selector has three parameters controlling which bin pairs are included:
from firecrown.metadata_types import ThreeTwoBinPairSelector
# Default: fairly permissive
default_3x2pt = ThreeTwoBinPairSelector(
source_dist=5, # Cosmic shear: source_i - source_j ∈ [-5, 5]
lens_dist=5, # Clustering: lens_i - lens_j ∈ [-5, 5]
source_lens_dist=5 # Galaxy-galaxy lensing: source_i - lens_j ∈ {1,2,3,4,5}
)
default_pairs = make_binned_two_point_filtered(all_bins, default_3x2pt)
print(f"Default 3×2pt: {len(default_pairs)} pairs")Default 3×2pt: 40 pairs
What each parameter does:
source_dist: For cosmic shear, controls which source bin index pairs are included- Difference:
source_i - source_j ∈ [-source_dist, source_dist] - Example:
source_dist=5allows differences in{-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5} - Larger values → more cross-redshift correlations → more data, but weaker signals
- Difference:
lens_dist: For galaxy clustering, same logic for lens bin pairs- Difference:
lens_i - lens_j ∈ [-lens_dist, lens_dist]
- Difference:
source_lens_dist: For galaxy-galaxy lensing, enforcessource_index - lens_index ∈ {1, 2, ..., source_lens_dist}- Does NOT include 0: Excludes same-index pairs to avoid mixing source/lens from same redshift bin
- Only positive differences: Ensures lens is at lower redshift than source
- Physically motivated: Lens must be in front of source
Why limit redshift separation?
- Signal-to-noise decreases for widely separated bins
- Computational cost increases with more pairs
- Very distant bins may have negligible correlation
Why exclude same-index source-lens pairs?
- Standard practice: source and lens samples are typically drawn from the same parent population, just split by different selection criteria
- Correlating them at the same redshift would mix the selection effects in complicated ways
- Different analyses may handle this differently, but ThreeTwoBinPairSelector follows the conservative standard
Tuning for Your Analysis
# Conservative: Only nearest neighbors (highest S/N)
conservative_3x2pt = ThreeTwoBinPairSelector(
source_dist=2, # Only adjacent source bins
lens_dist=2, # Only adjacent lens bins
source_lens_dist=2 # Lens within 1 bin of source
)
conservative_pairs = make_binned_two_point_filtered(all_bins, conservative_3x2pt)
# Aggressive: More cross-correlations (more data, weaker signals)
aggressive_3x2pt = ThreeTwoBinPairSelector(
source_dist=10,
lens_dist=10,
source_lens_dist=8
)
aggressive_pairs = make_binned_two_point_filtered(all_bins, aggressive_3x2pt)
print(f"Conservative 3×2pt: {len(conservative_pairs)} pairs")
print(f"Default 3×2pt: {len(default_pairs)} pairs")
print(f"Aggressive 3×2pt: {len(aggressive_pairs)} pairs")Conservative 3×2pt: 31 pairs
Default 3×2pt: 40 pairs
Aggressive 3×2pt: 40 pairs
Verifying Physical Consistency
Let’s verify that galaxy-galaxy lensing pairs have lens at lower indices:
Code
import pandas as pd
from IPython.display import Markdown
import re
# Extract galaxy-galaxy lensing pairs
gl_pairs = [
p for p in default_pairs
if "source" in p.x.bin_name and "lens" in p.y.bin_name
]
# Parse indices
def get_index(bin_name):
match = re.search(r'(\d+)$', bin_name)
return int(match.group(1)) if match else None
gl_table = [
{
"source-bin": p.x.bin_name,
"lens-bin": p.y.bin_name,
"source-idx": get_index(p.x.bin_name),
"lens-idx": get_index(p.y.bin_name),
"diff": get_index(p.x.bin_name) - get_index(p.y.bin_name),
}
for p in gl_pairs[:10] # Show first 10
]
df = pd.DataFrame(gl_table)
print("\nGalaxy-galaxy lensing pairs (source × lens):")
print("Note: diff = source_idx - lens_idx should be positive (lens at lower z)\n")
display(Markdown(df.to_markdown(index=False)))
Galaxy-galaxy lensing pairs (source × lens):
Note: diff = source_idx - lens_idx should be positive (lens at lower z)
| source-bin | lens-bin | source-idx | lens-idx | diff |
|---|---|---|---|---|
| lsst_y1_source1 | lsst_y1_lens0 | 1 | 0 | 1 |
| lsst_y1_source2 | lsst_y1_lens0 | 2 | 0 | 2 |
| lsst_y1_source2 | lsst_y1_lens1 | 2 | 1 | 1 |
| lsst_y1_source3 | lsst_y1_lens0 | 3 | 0 | 3 |
| lsst_y1_source3 | lsst_y1_lens1 | 3 | 1 | 2 |
| lsst_y1_source3 | lsst_y1_lens2 | 3 | 2 | 1 |
| lsst_y1_source4 | lsst_y1_lens0 | 4 | 0 | 4 |
| lsst_y1_source4 | lsst_y1_lens1 | 4 | 1 | 3 |
| lsst_y1_source4 | lsst_y1_lens2 | 4 | 2 | 2 |
| lsst_y1_source4 | lsst_y1_lens3 | 4 | 3 | 1 |
When generating metadata from scratch, apply selectors to control which pairs are created:
import numpy as np
from firecrown.metadata_types import TwoPointHarmonic, AutoBinPairSelector
# Create selector for auto-correlations (name AND measurement)
auto_both_selector = AutoBinPairSelector()
# Filter pairs during generation
auto_both_pairs = make_binned_two_point_filtered(all_bins, auto_both_selector)
# Create harmonic-space metadata
ells = np.unique(np.geomspace(2, 2000, 128).astype(int))
auto_harmonic_metadata = [TwoPointHarmonic(XY=xy, ells=ells) for xy in auto_both_pairs]
print(f"Auto-correlation harmonic metadata: {len(auto_harmonic_metadata)}")Auto-correlation harmonic metadata: 10
Using Selectors with SACC Extraction
When extracting from SACC files, apply selectors to filter which correlations are loaded:
from firecrown.likelihood.factories import load_sacc_data
from firecrown.metadata_functions import extract_all_real_metadata
# Load SACC file
sacc_data = load_sacc_data("../tests/sacc_data.hdf5")
# Extract only source auto-correlations
source_auto_selector = SourceBinPairSelector() & AutoNameBinPairSelector()
source_auto_metadata = extract_all_real_metadata(
sacc_data, bin_pair_selector=source_auto_selector
)
print(f"Source auto-correlations from SACC: {len(source_auto_metadata)}")Source auto-correlations from SACC: 8
Display the source auto-correlations:
Code
import pandas as pd
from IPython.display import Markdown
source_auto_table = [
{
"bin-x": real.XY.x.bin_name,
"bin-y": real.XY.y.bin_name,
"measurement-x": str(real.XY.x_measurement),
"measurement-y": str(real.XY.y_measurement),
}
for real in source_auto_metadata
]
df = pd.DataFrame(source_auto_table)
Markdown(df.to_markdown(index=False))| bin-x | bin-y | measurement-x | measurement-y |
|---|---|---|---|
| src0 | src0 | Galaxies.PART_OF_XI_MINUS | Galaxies.PART_OF_XI_MINUS |
| src1 | src1 | Galaxies.PART_OF_XI_MINUS | Galaxies.PART_OF_XI_MINUS |
| src2 | src2 | Galaxies.PART_OF_XI_MINUS | Galaxies.PART_OF_XI_MINUS |
| src3 | src3 | Galaxies.PART_OF_XI_MINUS | Galaxies.PART_OF_XI_MINUS |
| src0 | src0 | Galaxies.PART_OF_XI_PLUS | Galaxies.PART_OF_XI_PLUS |
| src1 | src1 | Galaxies.PART_OF_XI_PLUS | Galaxies.PART_OF_XI_PLUS |
| src2 | src2 | Galaxies.PART_OF_XI_PLUS | Galaxies.PART_OF_XI_PLUS |
| src3 | src3 | Galaxies.PART_OF_XI_PLUS | Galaxies.PART_OF_XI_PLUS |
Compare with extracting all data:
# Extract everything (no selector)
all_metadata = extract_all_real_metadata(sacc_data)
print(f"All correlations in SACC: {len(all_metadata)}")
print(f"Filtered to source auto: {len(source_auto_metadata)}")All correlations in SACC: 45
Filtered to source auto: 8
Serialization
Selectors can be serialized to YAML for reuse:
from firecrown.utils import base_model_to_yaml, base_model_from_yaml
# Create a complex selector
selector = (AutoNameBinPairSelector() & SourceBinPairSelector()) | LensBinPairSelector()
# Serialize to YAML
selector_yaml = base_model_to_yaml(selector)
print("Serialized selector:")
print(selector_yaml)Serialized selector:
kind: or
pair_selectors:
- kind: and
pair_selectors:
- {kind: auto-name}
- {kind: source}
- {kind: lens}
Load from YAML:
from firecrown.metadata_types import BinPairSelector
# Deserialize
loaded_selector = base_model_from_yaml(BinPairSelector, selector_yaml)
# Use the loaded selector
loaded_pairs = make_binned_two_point_filtered(all_bins, loaded_selector)
print(f"Pairs from loaded selector: {len(loaded_pairs)}")Pairs from loaded selector: 20
Summary
Bin pair selectors provide a declarative way to control which bin pairs are included in your analysis:
- Atomic selectors: Auto, cross, source, lens, named pairs, index-based constraints
- Physical relevance: Index-based selectors enforce redshift ordering conventions
- Building blocks: Combine atomic selectors for domain-specific physics
- Logical operators: AND (
&), OR (|), NOT (~) - Integration: Works with both generators and SACC extraction
- Serialization: Save and load selector configurations
Key Selectors for Physical Constraints
All (Source, Lens) pairs
- Selector:
SourceLensBinPairSelector - Notes: Fundamental building block
Galaxy-galaxy lensing (with naming convention)
- Selector:
SourceLensBinPairSelector() &CrossNameDiffBinPairSelector(neighbors_diff=[0, 1]) - Notes: Assumes index ∝ redshift
Nearby bins (with naming convention)
- Selector:
AutoNameDiffBinPairSelector(neighbors_diff=[0, 1, -1]) - Notes: Limits index differences
Same data source
- Selector:
TypeSourceBinPairSelector(type_source=…) - Notes: Filter by provenance
Standard 3×2pt (with naming convention)
- Selector:
ThreeTwoBinPairSelector() - Notes: Pre-built composite
Custom high-S/N
- Selector:
ThreeTwoBinPairSelector(source_dist=2, lens_dist=2, source_lens_dist=1) - Notes: Conservative distances
Key Functions
| Function | Purpose | Selector Parameter |
|---|---|---|
make_binned_two_point_filtered |
Filter generated bins | Required |
filter_two_point_combinations |
Filter existing combinations | Required |
extract_all_photoz_bin_combinations |
Filter SACC bin combinations | Optional |
extract_all_harmonic_metadata |
Filter SACC harmonic data | Optional |
extract_all_real_metadata |
Filter SACC real-space data | Optional |
Critical Takeaways
- Start with measurement types:
SourceLensBinPairSelectorselects (Source, Lens) pairs - Add constraints as appropriate: Use naming conventions, manual lists, or (in the future)
dndzinspection - Building blocks enable flexibility: Compose atomic selectors for your specific requirements
- Different approaches are valid: Index-based filtering is convenient but not the only way
- Start simple, add complexity: Begin with measurement-type selectors, then add constraints
- Understand your data: Know whether your bins follow naming conventions or need other filtering
- Verify your selection: Check that the resulting pairs make sense for your analysis
- Document your choices: Be explicit about assumptions (e.g., “assumes index ∝ redshift”)
- Create custom composites: If you reuse a pattern, make it a
CompositeSelector
Next Steps
Now that you understand bin pair selectors:
- Generators: Apply selectors when generating metadata
- Loading SACC Data: Apply selectors when extracting from SACC
- Workflow Guide: See selectors in the complete analysis workflow
- Factory Basics: Construct TwoPoint objects from filtered data