Scale Cuts and Data Filtering

Version ?env:FIRECROWN_VERSION

Authors

Marc Paterno

Sandro Vitenti

Purpose of this Document

This tutorial demonstrates how to apply physical scale cuts to two-point statistics using TwoPointBinFilterCollection. Scale cuts are essential for limiting analyses to scales where theoretical models are accurate.

For background on the factory system, see Two-Point Factory Basics. For loading SACC data, see Loading SACC Data. For information on defining systematics factories, see Two-Point Factories Basics.

Filtering Data: Scale Cuts

Real analyses use only a subset of the measured two-point statistics, where the utilized data is typically limited by the accuracy of the models used to fit the data. It is useful to define the physical scales (corresponding to the data) that should be analyzed in a given likelihood evaluation of two-point statistics. Firecrown implements this feature through its factories, notably by defining a TwoPointBinFilterCollection object.

This object is a collection of TwoPointBinFilter objects, which define the valid data analysis range for a given combination of two-point tracers. For instance, we can define the filtered range of galaxy clustering auto-correlations as follows:

from firecrown.data_functions import TwoPointBinFilterCollection, TwoPointBinFilter
from firecrown.metadata_types import Galaxies
from firecrown.utils import base_model_to_yaml
from IPython.display import Markdown

tp_collection = TwoPointBinFilterCollection(
    filters=[
        TwoPointBinFilter.from_args(
            name1=f"lens{i}",
            measurement1=Galaxies.COUNTS,
            name2=f"lens{i}",
            measurement2=Galaxies.COUNTS,
            lower=2,
            upper=300,
        )
        for i in range(5)
    ],
    require_filter_for_all=True,
    allow_empty=True,
)
Markdown(f"```yaml\n{base_model_to_yaml(tp_collection)}\n```")
require_filter_for_all: true
allow_empty: true
filters:
- spec:
  - name: lens0
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens0
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support
- spec:
  - name: lens1
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens1
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support
- spec:
  - name: lens2
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens2
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support
- spec:
  - name: lens3
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens3
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support
- spec:
  - name: lens4
    measurement: {subject: Galaxies, property: COUNTS}
  - name: lens4
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support

Equivalently, we may reduce the complexity of the code slightly and specify the use of auto-correlations only:

tp_collection = TwoPointBinFilterCollection(
    filters=[
        TwoPointBinFilter.from_args_auto(
            name=f"lens{i}",
            measurement=Galaxies.COUNTS,
            lower=2,
            upper=300,
        )
        for i in range(5)
    ],
    require_filter_for_all=True,
    allow_empty=True,
)
Markdown(f"```yaml\n{base_model_to_yaml(tp_collection)}\n```")
require_filter_for_all: true
allow_empty: true
filters:
- spec:
  - name: lens0
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support
- spec:
  - name: lens1
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support
- spec:
  - name: lens2
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support
- spec:
  - name: lens3
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support
- spec:
  - name: lens4
    measurement: {subject: Galaxies, property: COUNTS}
  interval: [2.0, 300.0]
  method: support

One may alternatively define the tracers directly (instead of from arguments) as TwoPointTracerSpec objects.

Using TwoPointExperiment

A TwoPointExperiment object keeps track of the relevant Factory instances to generate the two-point configurations of the analysis (either in configuration or harmonic space) and the scale-cut/data filtering choices to evaluate a defined likelihood. The interpretation of the filtered lower and upper limits of the data depend on the definition of the TwoPointExperiment factories in either configuration or harmonic space.

With this formalism, we can evaluate the likelihood exactly as in the Loading SACC Data tutorial by defining filters to be very wide. Alternatively, by setting a restrictively small filtered range, we can remove data from the analysis. In the example below, we filter out all galaxy clustering data by using an extremely narrow range.

from firecrown.likelihood import TwoPointFactory
from firecrown.likelihood.factories import (
    DataSourceSacc,
    TwoPointExperiment,
)
from firecrown.utils import base_model_from_yaml

two_point_yaml = """
correlation_space: real
weak_lensing_factories:
  - type_source: default
    per_bin_systematics:
    - type: MultiplicativeShearBiasFactory
    - type: PhotoZShiftFactory
    global_systematics:
    - type: LinearAlignmentSystematicFactory
      alphag: 1.0
number_counts_factories:
  - type_source: default
    per_bin_systematics:
    - type: PhotoZShiftFactory
    global_systematics: []
"""

tpf = base_model_from_yaml(TwoPointFactory, two_point_yaml)

two_point_experiment = TwoPointExperiment(
    two_point_factory=tpf,
    data_source=DataSourceSacc(
        sacc_data_file="../tests/sacc_data.hdf5",
        filters=TwoPointBinFilterCollection(
            require_filter_for_all=False,
            allow_empty=True,
            filters=[
                TwoPointBinFilter.from_args_auto(
                    name=f"lens{i}",
                    measurement=Galaxies.COUNTS,
                    lower=0.5,
                    upper=300,
                )
                for i in range(5)
            ],
        ),
    ),
)

two_point_experiment_filtered = TwoPointExperiment(
    two_point_factory=tpf,
    data_source=DataSourceSacc(
        sacc_data_file="../tests/sacc_data.hdf5",
        filters=TwoPointBinFilterCollection(
            require_filter_for_all=False,
            allow_empty=True,
            filters=[
                TwoPointBinFilter.from_args_auto(
                    name=f"lens{i}",
                    measurement=Galaxies.COUNTS,
                    lower=2999,
                    upper=3000,
                )
                for i in range(5)
            ],
        ),
    ),
)

Serializing TwoPointExperiment

The TwoPointExperiment objects can also be used to create likelihoods in the ready state. Additionally, they can be serialized into a YAML file, making it easier to share specific analysis choices with other users and collaborators.

The yaml below shows the first experiment:

Code
Markdown(f"```yaml\n{base_model_to_yaml(two_point_experiment)}\n```")
two_point_factory:
  correlation_space: real
  weak_lensing_factories:
  - type_source: default
    per_bin_systematics:
    - {type: MultiplicativeShearBiasFactory}
    - {type: PhotoZShiftFactory}
    global_systematics:
    - {type: LinearAlignmentSystematicFactory, alphag: 1.0}
  number_counts_factories:
  - type_source: default
    per_bin_systematics:
    - {type: PhotoZShiftFactory}
    global_systematics: []
    include_rsd: false
  cmb_factories: []
  int_options: null
data_source:
  sacc_data_file: ../tests/sacc_data.hdf5
  filters:
    require_filter_for_all: false
    allow_empty: true
    filters:
    - spec:
      - name: lens0
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
      method: support
    - spec:
      - name: lens1
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
      method: support
    - spec:
      - name: lens2
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
      method: support
    - spec:
      - name: lens3
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
      method: support
    - spec:
      - name: lens4
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [0.5, 300.0]
      method: support
  normalize_window: true
ccl_factory: {require_nonlinear_pk: false, amplitude_parameter: sigma8, mass_split: normal,
  num_neutrino_masses: null, creation_mode: default, pure_ccl_transfer_function: boltzmann_camb,
  use_camb_hm_sampling: false, allow_multiple_camb_instances: false, camb_extra_params: null,
  ccl_spline_params: null, parameter_prefix: null}

The yaml below shows the second experiment:

Code
Markdown(f"```yaml\n{base_model_to_yaml(two_point_experiment_filtered)}\n```")
two_point_factory:
  correlation_space: real
  weak_lensing_factories:
  - type_source: default
    per_bin_systematics:
    - {type: MultiplicativeShearBiasFactory}
    - {type: PhotoZShiftFactory}
    global_systematics:
    - {type: LinearAlignmentSystematicFactory, alphag: 1.0}
  number_counts_factories:
  - type_source: default
    per_bin_systematics:
    - {type: PhotoZShiftFactory}
    global_systematics: []
    include_rsd: false
  cmb_factories: []
  int_options: null
data_source:
  sacc_data_file: ../tests/sacc_data.hdf5
  filters:
    require_filter_for_all: false
    allow_empty: true
    filters:
    - spec:
      - name: lens0
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
      method: support
    - spec:
      - name: lens1
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
      method: support
    - spec:
      - name: lens2
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
      method: support
    - spec:
      - name: lens3
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
      method: support
    - spec:
      - name: lens4
        measurement: {subject: Galaxies, property: COUNTS}
      interval: [2999.0, 3000.0]
      method: support
  normalize_window: true
ccl_factory: {require_nonlinear_pk: false, amplitude_parameter: sigma8, mass_split: normal,
  num_neutrino_masses: null, creation_mode: default, pure_ccl_transfer_function: boltzmann_camb,
  use_camb_hm_sampling: false, allow_multiple_camb_instances: false, camb_extra_params: null,
  ccl_spline_params: null, parameter_prefix: null}

Creating and Comparing Likelihoods

Next, we create likelihoods from the TwoPointExperiment objects and compare the loglike values.

from firecrown.modeling_tools import ModelingTools
from firecrown.modeling_tools import CCLFactory
from firecrown.updatable import get_default_params_map

tools = ModelingTools()
likelihood_tpe = two_point_experiment.make_likelihood()


params = get_default_params_map(tools, likelihood_tpe)

likelihood_tpe.update(params)
tools.update(params)
tools.prepare()

likelihood_tpe_filtered = two_point_experiment_filtered.make_likelihood()

tools = ModelingTools()
params = get_default_params_map(tools, likelihood_tpe_filtered)

tools.update(params)
tools.prepare()
likelihood_tpe_filtered.update(params)

Compare the log-likelihood values:

Code
print(f"Loglike from TwoPointExperiment: {likelihood_tpe.compute_loglike(tools)}")
print(
    f"Loglike from filtered TwoPointExperiment: {likelihood_tpe_filtered.compute_loglike(tools)}"
)
Loglike from TwoPointExperiment: -2742.739024737394
Loglike from filtered TwoPointExperiment: -2579.948781562013

The filtered likelihood should show a different value due to the excluded data points.

Next Steps