def doc_theme():
return theme_minimal() + theme(
panel_grid_minor=element_line(color="gray", linetype="--"),
)Firecrown’s Two-Point Framework
version 1.14.3
Purpose of this Document
This document describes the conceptual structure behind, and design principals of, Firecrown’s framework for constructing cosmological two-point likelihoods. It is intended to provide a high-level overview of how measurements, binning schemes, and data layouts are represented and organized in the codebase. The focus is on the rationale behind each abstraction and the motivation for the chosen hierarchy. Detailed descriptions of each component are provided in the following sections.
Introduction
Firecrown’s two-point likelihood framework is built around a structured hierarchy of metadata types that describe the configuration and organization of cosmological measurements. At the foundation are basic bin descriptors, such as InferredGalaxyZDist, which define properties of individual observables or selections. These descriptors are not directly tied to specific measurements but are shared across multiple components in a complex analysis. This allows Firecrown to ensure that components that should have identical descriptors have exactly identical descriptors, by construction.
These bin descriptors are combined into TwoPointXY objects, which represent a pair of bins that are to be cross-correlated. This level abstracts the bin pairing process and ensures consistency between the inputs. It acts as a validation and grouping layer before specifying how the measurements are structured.
The framework then separates measurement types by data layout: TwoPointHarmonic for harmonic-space measurements (e.g., as a function of multipole moments \(\ell\)s) and TwoPointReal for real-space measurements (e.g., as a function of angular separation \(\theta\)). These objects describe the arrangement of data without requiring the data itself, making them suitable for forecasting or model comparison tasks.
At the top of the hierarchy is the TwoPointMeasurement, which includes the data layout (TwoPointHarmonic or TwoPointReal) and the associated measured values, either observed data or mock realizations. This layered structure allows for flexible modeling and reuse of components, avoiding duplication and enabling a clear separation between metadata, data layout, and measured values.
The framework described so far defines dataclasses for representing metadata and measured values. To compute theoretical predictions or build likelihoods, a TwoPoint object must be created. This object requires a measurement layout (TwoPointHarmonic or TwoPointReal) or a full TwoPointMeasurement, and includes all modeling assumptions—such as galaxy bias, shot noise, intrinsic alignment, and systematics. Multiple TwoPoint instances can be constructed from the same layout with different model choices. While this can be done manually, the TwoPointFactory automates the creation of TwoPoint objects from either layout or measurement objects. See Two-Point Factories for details.
The complete hierarchy is visualized in Figure 1.
graph TD A["Basic Bin Description<br/>e.g. InferredGalaxyZDist"] A --> B["TwoPointXY<br/>(bin combination and<br/>measured types)"] B --> C1["TwoPointHarmonic<br/>(ell layout)"] B --> C2["TwoPointReal<br/>(theta layout)"] C1 --> D["TwoPointMeasurement<br/>+ measured data"] C2 --> D F["TwoPointFactory<br/>(model choices)"] F --> E["TwoPoint"] C1 --> F C2 --> F D --> F
Basic Bin Descriptions
The foundation of Firecrown’s two-point framework begins with a description of individual bins, represented as dataclasses. These classes encapsulate the metadata required to define a bin used in cosmological measurements.
A key example is the InferredGalaxyZDist class, which describes the inferred redshift distribution of a single bin. Each bin is labeled using a InferredGalaxyZDist.bin_name and includes arrays InferredGalaxyZDist.z and InferredGalaxyZDist.dndz representing the redshift and corresponding distribution. Additionally, it contains a set of Measurement types (e.g., Galaxies.COUNTS, Galaxies.SHEAR_T, CMB.CONVERGENCE)1 indicating which observables were measured in this bin. It is valid to include multiple measurement types for the same bin, such as galaxy number counts and shear measured on the same galaxy subsample.
1 Galaxies and CMB are both Python types, each representing a different kind of measurement. Galaxies.COUNTS, Galaxies.SHEAR_T and CMB.CONVERGENCE are the names of values of these types, each representing a specific subtype of measurement. New types of measurements can be added to Firecrown by defining a new enumeration type. New values of any enumeration type can also be added. Such additions are made in firecrown/metadata_types.py. Firecrown’s use of type checking should ensure that any code that would be invalidated by adding a new type, or value, is identified.
2 More precisely, type_source is actually of type TypeSource, which is a subclass of str. This type has some additional functionality used by the metadata system for validation, and which is usually transparent to users of Firecrown.
To further differentiate among subpopulations, each bin may be tagged with a InferredGalaxyZDist.type_source. This identifier, while typically a simple string2, distinguishes between subsets within the same measurement category. For example, in galaxy surveys, InferredGalaxyZDist.type_source might refer to red vs. blue galaxies, or in CMB lensing, to Planck vs. SPT data. This tag allows the modeling framework to apply different theoretical treatments or nuisance parameterizations to distinct subcomponents while keeping the overall bin structure unified.
The implementation is as follows:
```python
@dataclass(frozen=True, kw_only=True) class InferredGalaxyZDist(YAMLSerializable): """The class used to store the redshift resolution data for a sacc file. The sacc file is a complicated set of tracers (bins) and surveys. This class is used to store the redshift resolution data for a single photometric bin.""" bin_name: str z: np.ndarray dndz: np.ndarray measurements: set[Measurement] type_source: TypeSource = TypeSource.DEFAULT
```
Subclassing YAMLSerializable enables these objects to be saved to and loaded from YAML files, facilitating configuration and reproducibility. The dataclass is frozen, indicating that InferredGalaxyZDist objects cannot be modified after creation. Modifying an InferredGalaxyZDist object that may be shared between multiple TwoPoint objects could lead to unexpected behavior. It is also marked with kw_only, ensuring that only keyword arguments are accepted during construction, thus ensuring that use of the dataclass is mostly self-documenting.
In other tutorials, we explore the following topics:
Inferred Redshift Distributions: A guide to utilizing Firecrown’s capabilities to describe galaxy redshift distributions for cosmological analyses.
Redshift Distribution Generators: Demonstrates how to generate
InferredGalaxyZDistobjects from LSST SRD redshift distributions.Redshift Distribution Serialization: Explains how to serialize and deserialize
InferredGalaxyZDistobjects and use theZDistLSSTSRDBinandZDistLSSTSRDBinCollectiongenerators dataclasses.
Bin Combinations: TwoPointXY
Once individual bins have been defined, they are combined to specify a particular two-point correlation measurement. This is handled by the TwoPointXY class, which represents a pairing of two bins (TwoPointXY.x and TwoPointXY.y) along with the specific observables measured in each.
This abstraction plays a critical role in separating the shared metadata of the bins from the specific measurement being constructed. A given bin (e.g., a sample of galaxies) may be involved in multiple different observables, such as number counts and shear. TwoPointXY allows each measurement pairing to be represented without duplicating bin information.
In practice, the same pair of bins may appear in multiple combinations, each corresponding to a different observable pair. For example:
bin1 = InferredGalaxyZDist(
z=..., dndz=...,
measurements={Galaxies.COUNTS, Galaxies.SHEAR_E}
)
bin2 = InferredGalaxyZDist(
z=..., dndz=...,
measurements={Galaxies.COUNTS, Galaxies.SHEAR_E}
)Here, both bins support galaxy number counts and shear measurements. These bins can be used to construct different TwoPointXY combinations, depending on which observables are being cross-correlated:
# Counts x Counts
comb1 = TwoPointXY(
x=bin1,
y=bin2,
x_measurement=Galaxies.COUNTS,
y_measurement=Galaxies.COUNTS
)
# Shear x Counts
comb2 = TwoPointXY(
x=bin1,
y=bin2,
x_measurement=Galaxies.SHEAR_E,
y_measurement=Galaxies.COUNTS
)Each TwoPointXY object thus uniquely defines a single combination of two bins and the associated measurements. This structure ensures that downstream objects (such as data layouts and likelihood components) have access to the full bin metadata while clearly distinguishing which observable is being used.
This also allows validation logic to be centralized: TwoPointXY enforces that the specified TwoPointXY.x_measurement and TwoPointXY.y_measurement are actually among the measurements supported by the corresponding bins.
Measurement Layout: TwoPointHarmonic and TwoPointReal
After defining a specific bin pairing and observable combination using TwoPointXY, the framework distinguishes the domain in which the measurement is made—harmonic space or real space. This distinction is captured by the TwoPointHarmonic and TwoPointReal classes, which extend the structure by incorporating the data layout.
Each of these classes includes a reference to the corresponding TwoPointXY object, preserving the bin and observable metadata while adding the angular or harmonic domain configuration. This avoids redundancy and keeps the measurement layout modular.
In harmonic space, the measurement is characterized by a set of multipole orders (\(\ell\)s) at which the multipole moments (\(C_\ell\)s) of the measured quantity are calculated. Optionally, the layout can include a window function to select the orders of interest. For example:
meta1 = TwoPointHarmonic(
XY=comb1,
ells=...,
window=...
)For real-space measurements, the layout is specified by a set of angular separation values (\(\theta\)):
meta2 = TwoPointReal(
XY=comb1,
thetas=...
)This separation allows the same bin and observable combination (TwoPointXY) to be used in different contexts, e.g., one for computing angular power spectra and another for correlation functions, without duplicating bin-pair metadata. It also enables use cases such as theoretical predictions or forecasts where data are not yet available, but the layout must be fully defined.
Two-Point Data Container: TwoPointMeasurement
The previous types (InferredGalaxyZDist, TwoPointXY, TwoPointHarmonic, TwoPointReal) define the structure and layout of a two-point measurement, what is being measured and how, but do not include actual data. These layout descriptions can be used independently of any measurement data, for example, in computing theoretical predictions, plotting expectations, or running forecasts.
To represent an actual measurement, either from observational data or a simulated dataset, the framework introduces the TwoPointMeasurement class. This type links the layout with the measured data values and the corresponding covariance structure:
measurement = TwoPointMeasurement(
data=...,
indices=...,
covariance_name=...,
metadata=meta1
)Where:
TwoPointMeasurement.datacontains the array of measured values (e.g., \(C_\ell\) or \(\xi(\theta)\)).TwoPointMeasurement.indicesspecify which entries in the global covariance matrix correspond to this measurement vector.TwoPointMeasurement.covariance_nameis a reference to a named covariance block in the full data vector structure.TwoPointMeasurement.metadatais an instance ofTwoPointHarmonicorTwoPointReal, fully specifying the measurement layout.
This final level completes the description by connecting the theoretical and structural metadata with real or simulated observations, enabling their use in likelihood evaluations, consistency checks, or other statistical analyses.
Two-Point Factories
The TwoPointHarmonic and TwoPointReal classes describe the layout of two-point measurements in harmonic and real space, respectively. They specify what is being measured and how, but do not include actual data. These layout objects can be used in two distinct contexts:
- To build a
TwoPointMeasurement, which includes data and indices into a covariance matrix. - To initialize a
TwoPointobject, which encapsulates all modeling choices and can be used to compute theoretical predictions and/or contribute to a likelihood.
A TwoPoint object can be constructed from either a layout (TwoPointHarmonic or TwoPointReal) or a measurement (TwoPointMeasurement). In both cases, it computes a TheoryVector. If the object includes data (i.e., initialized from a TwoPointMeasurement), it also contains a DataVector. As a subclass of Firecrown’s Statistic, a TwoPoint can be included in a Likelihood; see Introduction for further details.
To simplify the construction of TwoPoint objects from layouts, Firecrown provides the TwoPointFactory. This factory automates source creation based on metadata found in the layout description. Specifically, it inspects the measurement types in each TwoPointXY (i.e., the TwoPointXY.x_measurement and TwoPointXY.y_measurement fields) and delegates to appropriate source factories—such as WeakLensingFactory or NumberCountsFactory.
A subset of Measurement types is associated with each source factory. For example:
Galaxies.COUNTS→NumberCountsFactoryGalaxies.SHEAR_E|Galaxies.SHEAR_T| ...→WeakLensingFactory
The TwoPointFactory can hold multiple instances of the same source factory, each tied to a different TypeSource. This allows distinct modeling choices for different subpopulations (e.g., red vs. blue galaxies), even when the measurement type is the same. By default, both basic bins and factories use TypeSource.DEFAULT, so in simpler analyses the TypeSource distinction can be ignored.
See Two-Point Factories for implementation and usage details.
Summary
This framework provides a structured approach for managing two-point correlation measurements in cosmology. It separates concerns into different components, each with a specific role, and supports both theoretical predictions and actual observational data. All dataclasses below are frozen for immutability and support YAML serialization.
InferredGalaxyZDist:- Represents the metadata for a particular bin, which includes redshift distributions, measurement types (e.g., galaxy counts, shear), and a
TypeSourceto distinguish subpopulations (e.g., red or blue galaxies).
- Represents the metadata for a particular bin, which includes redshift distributions, measurement types (e.g., galaxy counts, shear), and a
TwoPointXY:- Combines two
InferredGalaxyZDistbins and specifies which measurements (e.g., galaxy counts, shear) are being cross-correlated. - This object encapsulates the logic for validating that the selected measurements match the bins involved, avoiding duplication of bin metadata.
- Combines two
TwoPointHarmonicandTwoPointReal:- Extend the
TwoPointXYstructure by adding the data layout: angular multipoles (\(\ell\)) and window functions for harmonic space, or angular separation values (\(\theta\)) for real space. - These objects are used to define the measurement domain without repeating the bin-level metadata, making it easier to compute forecasts or compare theoretical models with data.
- Extend the
TwoPointMeasurement:- Connects the layout (from
TwoPointHarmonicorTwoPointReal) with actual measurements, including the data values, covariance structure, and indices that refer to the global covariance matrix. - This is the final object that contains the real or mock data used for statistical analysis or likelihood evaluation, bridging the gap between theory and observed data.
- Connects the layout (from
Key Design Features:
- Separation of Concerns: The framework separates the metadata of bins, the combinations of measurements, and the data layout, aiming to promote modularity and to reduce redundancy.
- Reusability: Each component can be reused in various combinations for different observables and data layouts (e.g., counts vs. shear, harmonic vs. real space).
- Validation: Validation is handled at the appropriate levels, such as ensuring that measurements are valid for the bins and that layout parameters are correctly specified.
- Forecasting and Prediction: The framework is designed to support theoretical predictions (e.g., forecasting angular power spectra) even when real data are not available.
- Covariance Integration: The inclusion of indices and covariance names in
TwoPointMeasurementensures that each measurement aligns with the global data structure and covariance matrix, which is essential for likelihood evaluations.
Overall Architecture:
The framework aims for the easy construction of two-point correlation measurements with flexible data layouts, enabling both theoretical predictions and analysis of real observational data. Its modular design ensures that it can scale for complex cosmological analyses, providing a strong foundation for data confrontation and model testing.