Firecrown’s Two-Point Framework

Version 1.15.1

Authors

Marc Paterno

Sandro Vitenti

Purpose of this Document

This document describes the conceptual structure behind, and design principals of, Firecrown’s framework for constructing cosmological two-point likelihoods. It is intended to provide a high-level overview of how measurements, binning schemes, and data layouts are represented and organized in the codebase. The focus is on the rationale behind each abstraction and the motivation for the chosen hierarchy. Detailed descriptions of each component are provided in the following sections.

If you are looking for practical instructions on how to use the system, you may go directly to the workflow tutorial: Two-Point Workflow.

Introduction

Firecrown’s two-point likelihood framework is built around a structured hierarchy of metadata types that describe the configuration and organization of cosmological measurements. At the foundation are basic bin descriptors, such as InferredGalaxyZDist, which define properties of individual observables or selections. These descriptors are not directly tied to specific measurements but are shared across multiple components in a complex analysis. This allows Firecrown to ensure that components that should have identical descriptors have exactly identical descriptors, by construction.

These bin descriptors are combined into TwoPointXY objects, which represent a pair of bins that are to be cross-correlated. This level abstracts the bin pairing process and ensures consistency between the inputs. It acts as a validation and grouping layer before specifying how the measurements are structured.

The framework then separates measurement types by data layout: TwoPointHarmonic for harmonic-space measurements (e.g., as a function of multipole moments \(\ell\)s) and TwoPointReal for real-space measurements (e.g., as a function of angular separation \(\theta\)). These objects describe the arrangement of data without requiring the data itself, making them suitable for forecasting or model comparison tasks.

At the top of the hierarchy is the TwoPointMeasurement, which includes the data layout (TwoPointHarmonic or TwoPointReal) and the associated measured values, either observed data or mock realizations. This layered structure allows for flexible modeling and reuse of components, avoiding duplication and enabling a clear separation between metadata, data layout, and measured values.

The framework described so far defines dataclasses for representing metadata and measured values. To compute theoretical predictions or build likelihoods, a TwoPoint object must be created. This object requires a measurement layout (TwoPointHarmonic or TwoPointReal) or a full TwoPointMeasurement, and includes all modeling assumptions—such as galaxy bias, shot noise, intrinsic alignment, and systematics. Multiple TwoPoint instances can be constructed from the same layout with different model choices. While this can be done manually, the TwoPointFactory automates the creation of TwoPoint objects from either layout or measurement objects. See Two-Point Factories for details.

The complete hierarchy is visualized in Figure 1.

graph TD
  A["Basic Bin Description<br/>e.g. InferredGalaxyZDist"]
  A --> B["TwoPointXY<br/>(bin combination and<br/>measured types)"]
  B --> C1["TwoPointHarmonic<br/>(ell layout)"]
  B --> C2["TwoPointReal<br/>(theta layout)"]
  C1 --> D["TwoPointMeasurement<br/>+ measured data"]
  C2 --> D
  F["TwoPointFactory<br/>(model choices)"]
  F --> E["TwoPoint"]
  C1 --> F
  C2 --> F
  D --> F

Figure 1: Hierarchy of types in the Firecrown two-point statistic framework.

Basic Bin Descriptions

The foundation of Firecrown’s two-point framework begins with a description of individual bins, represented as dataclasses. These classes encapsulate the metadata required to define a bin used in cosmological measurements.

A key example is the InferredGalaxyZDist class, which describes the inferred redshift distribution of a single bin. Each bin is labeled using a InferredGalaxyZDist.bin_name and includes arrays InferredGalaxyZDist.z and InferredGalaxyZDist.dndz representing the redshift and corresponding distribution. Additionally, it contains a set of Measurement types (e.g., Galaxies.COUNTS, Galaxies.SHEAR_T, CMB.CONVERGENCE)¹ indicating which observables were measured in this bin. It is valid to include multiple measurement types for the same bin, such as galaxy number counts and shear measured on the same galaxy subsample.

¹ Galaxies and CMB are both Python types, each representing a different kind of measurement. Galaxies.COUNTS, Galaxies.SHEAR_T and CMB.CONVERGENCE are the names of values of these types, each representing a specific subtype of measurement. New types of measurements can be added to Firecrown by defining a new enumeration type. New values of any enumeration type can also be added. Such additions are made in firecrown/metadata_types.py. Firecrown’s use of type checking should ensure that any code that would be invalidated by adding a new type, or value, is identified.

² More precisely, type_source is actually of type TypeSource, which is a subclass of str. This type has some additional functionality used by the metadata system for validation, and which is usually transparent to users of Firecrown.

To further differentiate among subpopulations, each bin may be tagged with a InferredGalaxyZDist.type_source. This identifier, while typically a simple string², distinguishes between subsets within the same measurement category. For example, in galaxy surveys, InferredGalaxyZDist.type_source might refer to red vs. blue galaxies, or in CMB lensing, to Planck vs. SPT data. This tag allows the modeling framework to apply different theoretical treatments or nuisance parametrization to distinct subcomponents while keeping the overall bin structure unified.

The implementation is as follows:

```python

@dataclass(frozen=True, kw_only=True)
class InferredGalaxyZDist(YAMLSerializable):
    """The class used to store the redshift resolution data for a sacc file.
    
    The sacc file is a complicated set of tracers (bins) and surveys. This class is
    used to store the redshift resolution data for a single photometric bin."""
    bin_name: str
    z: np.ndarray
    dndz: np.ndarray
    measurements: set[Measurement]
    type_source: TypeSource = TypeSource.DEFAULT

```

Subclassing YAMLSerializable enables these objects to be saved to and loaded from YAML files, facilitating configuration and reproducibility. The dataclass is frozen, indicating that InferredGalaxyZDist objects cannot be modified after creation. Modifying an InferredGalaxyZDist object that may be shared between multiple TwoPoint objects could lead to unexpected behavior. It is also marked with kw_only, ensuring that only keyword arguments are accepted during construction, thus ensuring that use of the dataclass is mostly self-documenting.

In other tutorials, we explore the following topics:

Inferred Redshift Distributions: A guide to utilizing Firecrown’s capabilities to describe galaxy redshift distributions for cosmological analyses.
Redshift Distribution Generators: Demonstrates how to generate InferredGalaxyZDist objects from LSST SRD redshift distributions.
Redshift Distribution Serialization: Explains how to serialize and deserialize InferredGalaxyZDist objects and use the ZDistLSSTSRDBin and ZDistLSSTSRDBinCollection generators dataclasses.

Bin Combinations: `TwoPointXY`

Once individual bins have been defined, they are combined to specify a particular two-point correlation measurement. This is handled by the TwoPointXY class, which represents a pairing of two bins (TwoPointXY.x and TwoPointXY.y) along with the specific observables measured in each.

This abstraction plays a critical role in separating the shared metadata of the bins from the specific measurement being constructed. A given bin (e.g., a sample of galaxies) may be involved in multiple different observables, such as number counts and shear. TwoPointXY allows each measurement pairing to be represented without duplicating bin information.

In practice, the same pair of bins may appear in multiple combinations, each corresponding to a different observable pair. For example:

bin1 = InferredGalaxyZDist(
    z=..., dndz=...,
    measurements={Galaxies.COUNTS, Galaxies.SHEAR_E}
)
bin2 = InferredGalaxyZDist(
    z=..., dndz=...,
    measurements={Galaxies.COUNTS, Galaxies.SHEAR_E}
)

Here, both bins support galaxy number counts and shear measurements. These bins can be used to construct different TwoPointXY combinations, depending on which observables are being cross-correlated:

# Counts x Counts
comb1 = TwoPointXY(
    x=bin1,
    y=bin2,
    x_measurement=Galaxies.COUNTS,
    y_measurement=Galaxies.COUNTS
)

# Shear x Counts
comb2 = TwoPointXY(
    x=bin1,
    y=bin2,
    x_measurement=Galaxies.SHEAR_E,
    y_measurement=Galaxies.COUNTS
)

Each TwoPointXY object thus uniquely defines a single combination of two bins and the associated measurements. This structure ensures that downstream objects (such as data layouts and likelihood components) have access to the full bin metadata while clearly distinguishing which observable is being used.

This also allows validation logic to be centralized: TwoPointXY enforces that the specified TwoPointXY.x_measurement and TwoPointXY.y_measurement are actually among the measurements supported by the corresponding bins.

Measurement Layout: `TwoPointHarmonic` and `TwoPointReal`

After defining a specific bin pairing and observable combination using TwoPointXY, the framework distinguishes the domain in which the measurement is made—harmonic space or real space. This distinction is captured by the TwoPointHarmonic and TwoPointReal classes, which extend the structure by incorporating the data layout.

Each of these classes includes a reference to the corresponding TwoPointXY object, preserving the bin and observable metadata while adding the angular or harmonic domain configuration. This avoids redundancy and keeps the measurement layout modular.

In harmonic space, the measurement is characterized by a set of multipole orders (\(\ell\)s) at which the multipole moments (\(C_\ell\)s) of the measured quantity are calculated. Optionally, the layout can include a window function to select the orders of interest. For example:

meta1 = TwoPointHarmonic(
    XY=comb1,
    ells=...,
    window=...
)

For real-space measurements, the layout is specified by a set of angular separation values (\(\theta\)):

meta2 = TwoPointReal(
    XY=comb1,
    thetas=...
)

This separation allows the same bin and observable combination (TwoPointXY) to be used in different contexts, e.g., one for computing angular power spectra and another for correlation functions, without duplicating bin-pair metadata. It also enables use cases such as theoretical predictions or forecasts where data are not yet available, but the layout must be fully defined.

Two-Point Data Container: `TwoPointMeasurement`

The previous types (InferredGalaxyZDist, TwoPointXY, TwoPointHarmonic, TwoPointReal) define the structure and layout of a two-point measurement, what is being measured and how, but do not include actual data. These layout descriptions can be used independently of any measurement data, for example, in computing theoretical predictions, plotting expectations, or running forecasts.

To represent an actual measurement, either from observational data or a simulated dataset, the framework introduces the TwoPointMeasurement class. This type links the layout with the measured data values and the corresponding covariance structure:

measurement = TwoPointMeasurement(
    data=...,
    indices=...,
    covariance_name=...,
    metadata=meta1
)

Where:

TwoPointMeasurement.data contains the array of measured values (e.g., \(C_\ell\) or \(\xi(\theta)\)).
TwoPointMeasurement.indices specify which entries in the global covariance matrix correspond to this measurement vector.
TwoPointMeasurement.covariance_name is a reference to a named covariance block in the full data vector structure.
TwoPointMeasurement.metadata is an instance of TwoPointHarmonic or TwoPointReal, fully specifying the measurement layout.

This final level completes the description by connecting the theoretical and structural metadata with real or simulated observations, enabling their use in likelihood evaluations, consistency checks, or other statistical analyses.

Summary

This framework provides a structured approach for managing two-point correlation measurements in cosmology. It separates concerns into different components, each with a specific role, and supports both theoretical predictions and actual observational data. All dataclasses below are frozen for immutability and support YAML serialization.

InferredGalaxyZDist:
- Represents the metadata for a particular bin, which includes redshift distributions, measurement types (e.g., galaxy counts, shear), and a TypeSource to distinguish subpopulations (e.g., red or blue galaxies).
TwoPointXY:
- Combines two InferredGalaxyZDist bins and specifies which measurements (e.g., galaxy counts, shear) are being cross-correlated.
- This object encapsulates the logic for validating that the selected measurements match the bins involved, avoiding duplication of bin metadata.
TwoPointHarmonic and TwoPointReal:
- Extend the TwoPointXY structure by adding the data layout: angular multipoles (\(\ell\)) and window functions for harmonic space, or angular separation values (\(\theta\)) for real space.
- These objects are used to define the measurement domain without repeating the bin-level metadata, making it easier to compute forecasts or compare theoretical models with data.
TwoPointMeasurement:
- Connects the layout (from TwoPointHarmonic or TwoPointReal) with actual measurements, including the data values, covariance structure, and indices that refer to the global covariance matrix.
- This is the final object that contains the real or mock data used for statistical analysis or likelihood evaluation, bridging the gap between theory and observed data.

Key Design Features:

Separation of Concerns: The framework separates the metadata of bins, the combinations of measurements, and the data layout, aiming to promote modularity and to reduce redundancy.
Reusability: Each component can be reused in various combinations for different observables and data layouts (e.g., counts vs. shear, harmonic vs. real space).
Validation: Validation is handled at the appropriate levels, such as ensuring that measurements are valid for the bins and that layout parameters are correctly specified.
Forecasting and Prediction: The framework is designed to support theoretical predictions (e.g., forecasting angular power spectra) even when real data are not available.
Covariance Integration: The inclusion of indices and covariance names in TwoPointMeasurement ensures that each measurement aligns with the global data structure and covariance matrix, which is essential for likelihood evaluations.

Overall Architecture:

The framework aims for the easy construction of two-point correlation measurements with flexible data layouts, enabling both theoretical predictions and analysis of real observational data. Its modular design ensures that it can scale for complex cosmological analyses, providing a strong foundation for data confrontation and model testing.