Using Firecrown Factories to Initialize Two-Point Objects

version 1.10.0

Authors

Marc Paterno

Sandro Vitenti

Purpose of this Document

In the tutorial Two Point Generators we show how to use generators to create the necessary metadata to instantiate TwoPoint objects. In addition to generators, the user has also the option to use data already saved in a SACC object (or file) to initialize the TwoPoint objects. In this tutorial we show how to extract the metadata and data from SACC object and use the metadata and a factory to initialize the objects.

`SACC` objects

The SACC objects are used to store all necessary information for a statistical analysis. In practice, it stores metadata (data layout, data type, bins, dependent variables, etc), calibration data (redshift distribution per bin \(\mathrm{d}n/\mathrm{d}z\)) and the covariance between all measurements. In the tutorials InferredGalaxyZDist, InferredGalaxyZDist Generators and InferredGalaxyZDist Serialization we show how to use, generate and serialize the InferredGalaxyZDist objects describing the redshift distributions. In this document we extract these components from a SACC object.

Metadata only `SACC`

Up to the version 1.7 Firecrown relied on a two phase construction for its likelihoods. During the first phase, the Likelihood, Statistics and Systematics objects are created with metadata only. Then, the user must call the read method passing a SACC object to finish the object construction.

One shortcoming of this methodology is that the user must know the SACC object struct beforehand, so they can create the necessary objects. Then, a matching SACC object must be passed in the read phase. To simplify this, we introduced functions to extract the metadata from the SACC file, such that it can be used toguether with the factories to create the objects.

In the code below we extract the metadata indices from the examples/des_y1_3x2pt SACC file.

from firecrown.metadata_functions import extract_all_real_metadata_indices
import sacc

sacc_data = sacc.Sacc.load_fits("../examples/des_y1_3x2pt/sacc_data.fits")

all_meta = extract_all_real_metadata_indices(sacc_data)

The metadata can be seen below:

Code

import yaml
from IPython.display import Markdown

all_meta_prune = [
    {
        "data_type": meta["data_type"],
        "tracer1": str(meta["tracer_names"].name1),
        "tracer2": str(meta["tracer_names"].name2),
    }
    for meta in all_meta
]

all_meta_yaml = yaml.safe_dump(all_meta_prune, default_flow_style=False)

Markdown(f"```yaml\n{all_meta_yaml}\n```")

- data_type: galaxy_density_xi
  tracer1: lens0
  tracer2: lens0
- data_type: galaxy_density_xi
  tracer1: lens1
  tracer2: lens1
- data_type: galaxy_density_xi
  tracer1: lens2
  tracer2: lens2
- data_type: galaxy_density_xi
  tracer1: lens3
  tracer2: lens3
- data_type: galaxy_density_xi
  tracer1: lens4
  tracer2: lens4
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens0
  tracer2: src0
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens0
  tracer2: src1
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens0
  tracer2: src2
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens0
  tracer2: src3
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens1
  tracer2: src0
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens1
  tracer2: src1
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens1
  tracer2: src2
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens1
  tracer2: src3
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens2
  tracer2: src0
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens2
  tracer2: src1
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens2
  tracer2: src2
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens2
  tracer2: src3
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens3
  tracer2: src0
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens3
  tracer2: src1
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens3
  tracer2: src2
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens3
  tracer2: src3
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens4
  tracer2: src0
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens4
  tracer2: src1
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens4
  tracer2: src2
- data_type: galaxy_shearDensity_xi_t
  tracer1: lens4
  tracer2: src3
- data_type: galaxy_shear_xi_minus
  tracer1: src0
  tracer2: src0
- data_type: galaxy_shear_xi_minus
  tracer1: src0
  tracer2: src1
- data_type: galaxy_shear_xi_minus
  tracer1: src0
  tracer2: src2
- data_type: galaxy_shear_xi_minus
  tracer1: src0
  tracer2: src3
- data_type: galaxy_shear_xi_minus
  tracer1: src1
  tracer2: src1
- data_type: galaxy_shear_xi_minus
  tracer1: src1
  tracer2: src2
- data_type: galaxy_shear_xi_minus
  tracer1: src1
  tracer2: src3
- data_type: galaxy_shear_xi_minus
  tracer1: src2
  tracer2: src2
- data_type: galaxy_shear_xi_minus
  tracer1: src2
  tracer2: src3
- data_type: galaxy_shear_xi_minus
  tracer1: src3
  tracer2: src3
- data_type: galaxy_shear_xi_plus
  tracer1: src0
  tracer2: src0
- data_type: galaxy_shear_xi_plus
  tracer1: src0
  tracer2: src1
- data_type: galaxy_shear_xi_plus
  tracer1: src0
  tracer2: src2
- data_type: galaxy_shear_xi_plus
  tracer1: src0
  tracer2: src3
- data_type: galaxy_shear_xi_plus
  tracer1: src1
  tracer2: src1
- data_type: galaxy_shear_xi_plus
  tracer1: src1
  tracer2: src2
- data_type: galaxy_shear_xi_plus
  tracer1: src1
  tracer2: src3
- data_type: galaxy_shear_xi_plus
  tracer1: src2
  tracer2: src2
- data_type: galaxy_shear_xi_plus
  tracer1: src2
  tracer2: src3
- data_type: galaxy_shear_xi_plus
  tracer1: src3
  tracer2: src3

The tracer names above are ones from the SACC object. The user needs to know the tracer names to create the TwoPoint objects. However, now the user can use the metadata to create the objects.

Factories

The factories are used to create the TwoPoint objects. In Firecrown, we write factories using Pydantic, a library to validate and parse data. This gives us the advantage of having a schema for the data, and the user can use the factories to validate the data before creating the objects.

For example, we can organize the factories in a YAML file:

from firecrown.likelihood.two_point import TwoPoint
from firecrown.likelihood.weak_lensing import WeakLensingFactory
from firecrown.likelihood.number_counts import NumberCountsFactory
from firecrown.utils import base_model_from_yaml

weak_lensing_yaml = """
per_bin_systematics:
- type: MultiplicativeShearBiasFactory
- type: PhotoZShiftFactory
global_systematics:
- type: LinearAlignmentSystematicFactory
  alphag: 1.0
"""
number_counts_yaml = """
per_bin_systematics:
- type: PhotoZShiftFactory
global_systematics: []
"""

weak_lensing_factory = base_model_from_yaml(WeakLensingFactory, weak_lensing_yaml)
number_counts_factory = base_model_from_yaml(NumberCountsFactory, number_counts_yaml)

two_point_list = TwoPoint.from_metadata_index(
    metadata_indices=all_meta,
    wl_factory=weak_lensing_factory,
    nc_factory=number_counts_factory,
)

Creating a `Likelihood` object

Now we can create a Likelihood object using the TwoPoint objects. Since we are using the metadata only, we need to pass the SACC object to the read method.

from firecrown.likelihood.gaussian import ConstGaussian

likelihood = ConstGaussian(two_point_list)

likelihood.read(sacc_data)

Extracting the Metadata and Data

In the previous section, we demonstrated how to extract metadata from a SACC object. Now, we will show how to extract all components from the SACC object, including the metadata, data, and calibration data.

In the upcoming interface for Firecrown, likelihood objects will be created using the complete data. This data may be extracted from a SACC object, generated by a custom generator, or provided in any other manner by the user.

In the code below we extract all components from the examples/des_y1_3x2pt SACC file.

from firecrown.data_functions import (
    extract_all_real_data,
    check_two_point_consistence_real,
)
from pprint import pprint

two_point_reals = extract_all_real_data(sacc_data)
check_two_point_consistence_real(two_point_reals)

The list two_point_reals contains all information about real-space two point functions contained in the SACC object. Therefore, we need to use a different constructor to obtain the two-point objects already in the ready state.

from firecrown.likelihood.two_point import TwoPoint

weak_lensing_factory = base_model_from_yaml(WeakLensingFactory, weak_lensing_yaml)
number_counts_factory = base_model_from_yaml(NumberCountsFactory, number_counts_yaml)


two_points_ready = TwoPoint.from_measurement(
    two_point_reals,
    wl_factory=weak_lensing_factory,
    nc_factory=number_counts_factory,
)

Note that we used the same factories to instantiate our TwoPoint objects.

Now, to create the likelihood we need to use a different constructor that support creating likelihoods in ready state.

from firecrown.likelihood.gaussian import ConstGaussian

likelihood_ready = ConstGaussian.create_ready(
    two_points_ready, sacc_data.covariance.dense
)

Note that, since we are creating a Likelihood in ready state, the constructor requires the statistics covariance.

Comparing results

We can now compare whether the two methods yield the same results. Before we do that, we need to obtain our required parameters.

from firecrown.parameters import ParamsMap

req_params = likelihood.required_parameters()
req_params_ready = likelihood_ready.required_parameters()

assert req_params_ready == req_params

default_values = req_params.get_default_values()
params = ParamsMap(default_values)

Note that the required parameters are the same for both likelihoods. Before generating the two-point statistics, we use the following default values:

Code

import yaml

default_values_yaml = yaml.dump(default_values, default_flow_style=False)

Markdown(f"```yaml\n{default_values_yaml}\n```")

alphaz: 0.0
ia_bias: 0.5
lens0_bias: 1.5
lens0_delta_z: 0.0
lens1_bias: 1.5
lens1_delta_z: 0.0
lens2_bias: 1.5
lens2_delta_z: 0.0
lens3_bias: 1.5
lens3_delta_z: 0.0
lens4_bias: 1.5
lens4_delta_z: 0.0
src0_delta_z: 0.0
src0_mult_bias: 1.0
src1_delta_z: 0.0
src1_mult_bias: 1.0
src2_delta_z: 0.0
src2_mult_bias: 1.0
src3_delta_z: 0.0
src3_mult_bias: 1.0
z_piv: 0.5

Finally, we can prepare both likelihoods and compare the loglike values.

from firecrown.modeling_tools import ModelingTools
from firecrown.ccl_factory import CCLFactory
from firecrown.updatable import get_default_params_map
from firecrown.parameters import ParamsMap
from firecrown.utils import base_model_to_yaml
from firecrown.data_functions import TwoPointBinFilterCollection, TwoPointBinFilter
from firecrown.metadata_types import Galaxies


tools = ModelingTools(ccl_factory=CCLFactory(require_nonlinear_pk=True))
params = get_default_params_map(tools, likelihood)

tools = ModelingTools()
tools.update(params)
tools.prepare()

likelihood.update(params)
likelihood_ready.update(params)

Code

print(f"Loglike from metadata only: {likelihood.compute_loglike(tools)}")
print(f"Loglike from ready state: {likelihood_ready.compute_loglike(tools)}")

Loglike from metadata only: -2742.739024737394
Loglike from ready state: -2742.739024737394

Filtering Data: Scale-cuts

Real analyses use only a subset of the measured two-points statistics, where the utilized data is typically limited my the accuracy of the models used to fit the data. It is then useful to define the physical scales (corresponding to the data) that should be analyzed in a given likelihood evaluation of two-point statistics. Firecrown can implement this feature though its factories, notably by defining a TwoPointBinFilterCollection object. This object is a collection of TwoPointBinFilter objects, which define the valid data analysis range for a given combination of two-point tracers. For instance, we can define the filtered range of galaxy clustering auto-correlations as follows:

tp_collection = TwoPointBinFilterCollection(
    filters=[
        TwoPointBinFilter.from_args(
            name1=f"lens{i}",
            measurement1=Galaxies.COUNTS,
            name2=f"lens{i}",
            measurement2=Galaxies.COUNTS,
            lower=2,
            upper=300,
        )
        for i in range(5)
    ],
    require_filter_for_all=True,
    allow_empty=True,
)
Markdown(f"```yaml\n{base_model_to_yaml(tp_collection)}\n```")

require_filter_for_all: true
allow_empty: true
filters:
- spec:
  - name: lens0
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens0
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
- spec:
  - name: lens1
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens1
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
- spec:
  - name: lens2
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens2
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
- spec:
  - name: lens3
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens3
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
- spec:
  - name: lens4
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens4
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]

Equivalently, we may reduce the complexity of the code slightly and specify the use of auto-correlations only:

tp_collection = TwoPointBinFilterCollection(
                filters=[
                    TwoPointBinFilter.from_args_auto(
                        name=f"lens{i}",
                        measurement=Galaxies.COUNTS,
                        lower=2,
                        upper=300,
                    )
                    for i in range(5)
                ],
                require_filter_for_all=True,
                allow_empty=True,
)
Markdown(f"```yaml\n{base_model_to_yaml(tp_collection)}\n```")

require_filter_for_all: true
allow_empty: true
filters:
- spec:
  - name: lens0
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
- spec:
  - name: lens1
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
- spec:
  - name: lens2
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
- spec:
  - name: lens3
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
- spec:
  - name: lens4
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]

One may alternatively define the tracers directly (instead of from arguments) as TwoPointTracerSpec objects.

A TwoPointExperiment object is able to keep track of the relevant Factory instances to generate the two-point configurations of the analysis (either in configuration or harmonic space) and the scale-cut/data filtering choices to evaluate a defined likelihood. The interpretation of the filtered lower and upper limits of the data depend on the definition of the TwoPointExperiment factories in either configuration or harmonic space.

With this formalism, we are able to evaluate the likelihood exactly as the previous section by defining filters to be very wide. Alternatively, by setting a restrictively small filtered range, we can remove data from the analysis and do so in the example below by filtering-out all galaxy clustering data.

from firecrown.likelihood.factories import (
    DataSourceSacc,
    TwoPointCorrelationSpace,
    TwoPointExperiment,
    TwoPointFactory,
)

tpf = TwoPointFactory(
    correlation_space=TwoPointCorrelationSpace.REAL,
    weak_lensing_factory=weak_lensing_factory,
    number_counts_factory=number_counts_factory,
)

two_point_experiment = TwoPointExperiment(
    two_point_factory=tpf,
    data_source=DataSourceSacc(
        sacc_data_file="../examples/des_y1_3x2pt/sacc_data.fits",
        filters=TwoPointBinFilterCollection(
            require_filter_for_all=False,
            allow_empty=True,
            filters=[
                TwoPointBinFilter.from_args_auto(
                    name=f"lens{i}",
                    measurement=Galaxies.COUNTS,
                    lower=0.5,
                    upper=300,
                )
                for i in range(5)
            ],
        ),
    ),
)

two_point_experiment_filtered = TwoPointExperiment(
    two_point_factory=tpf,
    data_source=DataSourceSacc(
        sacc_data_file="../examples/des_y1_3x2pt/sacc_data.fits",
        filters=TwoPointBinFilterCollection(
            require_filter_for_all=False,
            allow_empty=True,
            filters=[
                TwoPointBinFilter.from_args_auto(
                    name=f"lens{i}",
                    measurement=Galaxies.COUNTS,
                    lower=2999,
                    upper=3000,
                )
                for i in range(5)
            ],
        ),
    ),
)

The TwoPointExperiment objects can also be used to create likelihoods in the ready state. Additionally, they can be serialized into a yaml file, making it easier to share specific analysis choices with other users and collaborators.

The yaml below shows the first experiment.

Code

Markdown(f"```yaml\n{base_model_to_yaml(two_point_experiment)}\n```")

two_point_factory:
  correlation_space: REAL
  weak_lensing_factory:
    per_bin_systematics:
    - {type: MultiplicativeShearBiasFactory}
    - {type: PhotoZShiftFactory}
    global_systematics:
    - {type: LinearAlignmentSystematicFactory, alphag: 1.0}
  number_counts_factory:
    per_bin_systematics:
    - {type: PhotoZShiftFactory}
    global_systematics: []
    include_rsd: false
data_source:
  sacc_data_file: ../examples/des_y1_3x2pt/sacc_data.fits
  filters:
    require_filter_for_all: false
    allow_empty: true
    filters:
    - spec:
      - name: lens0
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
    - spec:
      - name: lens1
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
    - spec:
      - name: lens2
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
    - spec:
      - name: lens3
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
    - spec:
      - name: lens4
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
ccl_factory: {require_nonlinear_pk: false, amplitude_parameter: SIGMA8, mass_split: NORMAL,
  creation_mode: DEFAULT, camb_extra_params: null, ccl_spline_params: null, parameter_prefix: null}

The yaml below shows the second experiment.

Code

Markdown(f"```yaml\n{base_model_to_yaml(two_point_experiment_filtered)}\n```")

two_point_factory:
  correlation_space: REAL
  weak_lensing_factory:
    per_bin_systematics:
    - {type: MultiplicativeShearBiasFactory}
    - {type: PhotoZShiftFactory}
    global_systematics:
    - {type: LinearAlignmentSystematicFactory, alphag: 1.0}
  number_counts_factory:
    per_bin_systematics:
    - {type: PhotoZShiftFactory}
    global_systematics: []
    include_rsd: false
data_source:
  sacc_data_file: ../examples/des_y1_3x2pt/sacc_data.fits
  filters:
    require_filter_for_all: false
    allow_empty: true
    filters:
    - spec:
      - name: lens0
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
    - spec:
      - name: lens1
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
    - spec:
      - name: lens2
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
    - spec:
      - name: lens3
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
    - spec:
      - name: lens4
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
ccl_factory: {require_nonlinear_pk: false, amplitude_parameter: SIGMA8, mass_split: NORMAL,
  creation_mode: DEFAULT, camb_extra_params: null, ccl_spline_params: null, parameter_prefix: null}

Next, we can create likelihoods from the TwoPointExperiment objects and compare the loglike values.

likelihood_tpe = two_point_experiment.make_likelihood()

params = get_default_params_map(tools, likelihood_tpe)

tools = ModelingTools()
tools.update(params)
tools.prepare()
likelihood_tpe.update(params)

likelihood_tpe_filtered = two_point_experiment_filtered.make_likelihood()

params = get_default_params_map(tools, likelihood_tpe_filtered)

tools = ModelingTools()
tools.update(params)
tools.prepare()
likelihood_tpe_filtered.update(params)

Code

print(f"Loglike from metadata only: {likelihood.compute_loglike(tools)}")
print(f"Loglike from ready state: {likelihood_ready.compute_loglike(tools)}")
print(f"Loglike from TwoPointExperiment: {likelihood_tpe.compute_loglike(tools)}")
print(f"Loglike from filtered TwoPointExperiment: {likelihood_tpe_filtered.compute_loglike(tools)}")

Loglike from metadata only: -2742.739024737394
Loglike from ready state: -2742.739024737394
Loglike from TwoPointExperiment: -2742.739024737394
Loglike from filtered TwoPointExperiment: -2579.948781562013

Purpose of this Document

SACC objects

Metadata only SACC

Factories

Creating a Likelihood object

Extracting the Metadata and Data

Comparing results

Filtering Data: Scale-cuts

`SACC` objects

Metadata only `SACC`

Creating a `Likelihood` object