An Introduction to Firecrown

Version 1.8.0a0

Authors

Marc Paterno

Sandro Vitenti

Prologue

This document is based on the Firecrown tutorial given at the Feb 2023 DESC Meeting Sprint Session. It has been updated to work with the version of Firecrown noted on the title slide. A recording of the original talk is available.

What is Firecrown?

Firecrown¹ is a software framework² that allows you to write likelihoods in a way that will enable you to integrate those likelihoods with statistical frameworks for parameter estimation, forecasting, or any other purpose. In considering our options, one possibility is to choose a single statistical framework and exclusively rely on it. However, different analyses may present distinct requirements that can only be effectively addressed by utilizing different statistical frameworks. So Firecrown provides a single framework for writing likelihoods that allows DESC scientists to use those likelihoods with any of the supported statistical frameworks. Moreover, Firecrown is intended to provide a well-defined environment in which all the DESC tools needed for likelihood-dependent analysis tasks are present. To accomplish this objective, Firecrown directly uses the DESC Core Cosmology Library CCL and the SACC data format library.

¹ A firecrown is a hummingbird native to Chile and Argentina. The reasons this software is named Firecrown are now lost to the mists of history. A green-backed firecrown.

² A software framework is an abstraction in which software providing generic functionality can be selectively changed by additional user-written code, thus providing application-specific software. Definition from Wikipedia.

Note that Firecrown itself does not do sampling, nor does it run the sampling frameworks. Instead, whichever sampling framework you are using calls (through the Firecrown connector code) the Firecrown likelihood you configure.

Firecrown can also be used as a tool inside another framework. For example, it is directly used by the DESC forecasting and inference tool Augur. Augur uses Firecrown to calculate observations predicted by theory (“theory vectors”) and likelihoods for those observations, and from these Augur calculates Fisher matrices. Augur can also use Firecrown to create mock data and to run Markov Chain Monte Carlo (MCMC) parameter estimation on those data.

Firecrown currently supports three statistical frameworks: Cobaya ³, CosmoSIS ⁴, and NumCosmo ⁵. Firecrown guarantees that the variety of DESC tools that it uses are instantiated correctly to be consistent with the use of any of these frameworks.

³ Cobaya (code for bayesian analysis, and Spanish for Guinea Pig) is a framework for sampling and statistical modeling: it allows you to explore an arbitrary prior or posterior using a range of Monte Carlo samplers (including the advanced MCMC sampler from CosmoMC, and the advanced nested sampler PolyChord). The results of the sampling can be analyzed with GetDist. It supports MPI parallelization (and very soon HPC containerization with Docker/Shifter and Singularity).

⁴ CosmoSIS is a cosmological parameter estimation code. It is a framework for structuring cosmological parameter estimation with a focus on flexibility, re-usability, debugging, verifiability, and code sharing in the form of calculation modules. It consolidates and connects existing code for predicting cosmic observables, and makes mapping out experimental likelihoods with a range of different techniques much more accessible.

⁵ NumCosmo is a free software C library whose main purposes are to test cosmological models using observational data and to provide a set of tools to perform cosmological calculations. Particularly, the current version has implemented three different probes: cosmic microwave background (CMB), supernovae type Ia (SNeIa), and large-scale structure (LSS) information, such as baryonic acoustic oscillations (BAO) and galaxy cluster abundance. The code supports joint analyses of these data and the parameter space can include cosmological and phenomenological parameters.

Basic Firecrown concepts

The three most important concepts represented in Firecrown are cosmology, modeling tools, and likelihoods. Each of these concepts is represented by some software artifact in Firecrown.

Firecrown’s concept of cosmology is provided by CCL. CCL provides all the necessary tools for calculating basic cosmological quantities. So everything that is general in cosmology is calculated by CCL, and not by Firecrown itself. This cosmology plays a central role in the set of tools provided to the user.

We also have the concept of modeling tools. These are a set of extra tools which, together with the CCL cosmology, allow one to calculate likelihoods. For example, modeling tools has a member called pt_calculator. When the ModelingTools is instantiated in the likelihood initialization, the pt_calculator is also instantiated and configured. Then, the same object can be used by different parts of the likelihood. The des_y1_3x2pt_PT example uses the pt_calculator to calculate the power spectra that are used in the likelihood. Thus, every source that requires power spectra can use the same pt_calculator object. All the available tools are presented, along with the cosmology, for calculation of the likelihood. Therefore, during a statistical analysis, whenever the likelihood is called, all the objects in the modeling tools have already been updated to represent the “current cosmology” with which they are associated. For the user who wants to calculate a likelihood that is not a Gaussian distribution, these are the only concepts in Firecrown that are needed. But since we are frequently working with Gaussian likelihoods, there are more software tools available for their support. These tools include more constrained likelihoods, statistics, sources, and systematics.

First, we have support for the Gaussian family of likelihoods. These are all the likelihoods that can be expressed as a function of the distance between the expected value of some observable quantity and the observed value of that quantity, where the measure of that distance is characterized by a covariance matrix. These are likelihoods of the form: \[P(\vec{x}|\vec{\mu},\widetilde{M}) = f(\chi^2)\] where \[\chi^2 = \sum_{i,j} (x_i - \mu_i) M_{i,j} (x_j - \mu_j)\] and where \(x_i\) are the components of the observed data vector \(\vec{x}\), \(\mu_i\) are the components of the predicted theory vector \(\vec{\mu}\), and \(M_{i,j}\) is the components of the inverse of the covariance matrix. In the Gaussian family, we currently have implemented the multivariate Gaussian distribution and the multivariate Student’s T distribution.

To build a Gaussian distribution, all one needs is to create a theory vector (the \(\mu_i\) above), and to get the data vector (\(x_i\) above) and covariance matrix \(\widetilde{C} = \widetilde{M}^{-1}\). The data vector and covariance matrix are typically read from a SACC file. The theoretical prediction associated with the data vector is computed by a statistic array. Users have the flexibility to either implement their own statistics objects or utilize existing ones already available within Firecrown. For instance, in the case of constructing a likelihood based on a two-point function, Firecrown provides pre-existing classes that represent such entities, readily accessible for utilization. The two-point function is a variety of statistic that is in turn dependent on sources. Sources are tools available for combining two observables (possibly the same observable, used twice) to create a two-point function, either a cross-correlation or an auto-correlation. These are simple layers to call the relevant CCL functions that will calculate the necessary integrals, whereas sources are used to compute integrands for the associated observables. So a statistic is a general concept, a two-point statistic is a specific kind of statistic, and sources are the means to calculate the observables for two-point statistics.

The systematic is a concept that is not yet so fully defined. Currently, systematics are a way of modifying the behavior of a theory prediction calculator. For example, if one has a distribution \(f(z) = dN/dz\) of some object in the sky as a function of redshift \(z\), and one wants to make a shift of this distribution (a bias) to the left or the right, this can be done using a systematic. One can put as many systematics as desired into the calculation of any statistic. Of course, one needs to take care that they are compatible and that the result makes sense. This is one of the parts of Firecrown that needs more development⁶; we are working to identify the set of types and functions that will help make sure that only meaningful combinations of systematics are created, and that systematic effects are not double-counted.

⁶ We invite contributions to the effort of defining the means of handling systematic effects. The Firecrown issues list can be used to discuss ideas for contributions.

High-level Firecrown classes

Each of these main Firecrown concepts is represented by one (or several) types in Firecrown.

The type used to represent a cosmology in Firecrown comes from CCL: pyccl.Cosmology. This class represents a parameterized cosmology.

The modeling tools are represented by firecrown.modeling_tools.ModelingTools. A ModelingTools object associates a cosmology with a set of objects representing theoretical models that can be used in a likelihood. Each of these may be used more than once in the evaluation of the likelihood. This is why they are gathered together in one location: to help assure that different parts of a likelihood calculation that require the same theoretical calculation get the identical theoretical calculation for a given cosmology.

Moreover, we define for ModelingTools an abstract class ⁷ for each additional tool that can be used in the likelihood calculation. Thus, the same tool can have different implementations, and the user can choose which one to use. This is intended to partially address the issue of systematic effects, as we can have different implementations of the same tool, each one representing a different systematic effect. For example, we can have different implementations of the halo model, each one including a different effects. This is in contrast to the current implementation, where we would have a single halo model that needs to have its results modified by Systematic objects.

⁷ An abstract class provides either methods or data (or both) for derived classes but is not complete. It is not possible to create an object whose type is an abstract class. Rather, one derives concrete classes from the abstract class and creates instances of those concrete types.

The likelihoods are represented by a base class firecrown.likelihood.Likelihood, and a variety of classes that inherit from that base class. The minimum implementation for a likelihood implements two methods:

read(sacc: sacc.SACC) -> None
calculate_loglike(tools: ModelingTools) -> float

The method read reads the necessary data (data vectors and covariances) from the provided sacc.SACC object. This specifies the data for which we are calculating the likelihood. The method calculate_loglike return the (natural) logarithm of the likelihood for the data given the cosmology and models in tools. Gaussian-related likelihoods are subclasses of firecrown.likelihood.gauss_family.GaussFamily. Currently-implemented subclasses include ConstGaussian and StudentT. ConstGaussian assumes a Gaussian distribution in which the covariance of the data is constant.

Building blocks for the `GaussFamily` likelihoods

For the Gaussian family of likelihoods, we have the base class GaussFamily. This is an abstract class that provides several features:

GaussFamily currently has two subclasses: ConstGaussian and StudentT. ConstGaussian implements a multivariate Gaussian likelihood with a covariance matrix that is constant (meaning that the covariance matrix does not vary with the cosmology, nor with any sampled parameters of the models in the ModelingTools.)

For any GaussFamily likelihood, one must have one or more Statistics. Statistic is an abstract base class for TwoPoint and Supernova. A Statistic provides access to observations (a data vector) and calculates predictions (a theory vector) based on a set of parameters (a cosmology). A Statistic is responsible for reading its data from a sacc.SACC object. A Statistic also has indices that are used to identify what blocks of covariance matrices in the SACC object will be read. A given SACC object may contain information from observations in many bins, but only those indicated by the indices in a Statistic will be read. Statistics may also contain systematics that modify the theory vector it calculates. All GaussFamily likelihoods have an implementation of the read method that reads data covariance information from the provided sacc.SACC object. These likelihoods use the indices from all of its (possibly many) Statistics to build the covariance matrix for the likelihood.

The class firecrown.likelihood.gauss_family.two_point.TwoPoint is a statistic that represents a two-point function. A TwoPoint object has two Sources, each of which is associated with one or more tracer names. To calculate an autocorrelation, use the same Source twice. Each Source will produce one or more pyccl.Tracers.⁸

⁸ From the CCL documentation: Tracers contain the information necessary to describe the contribution of a given sky observable to its cross-power spectrum with any other tracer. Tracers are composed of 4 main ingredients: A radial kernel: this expresses the support in redshift/distance over which this tracer extends. A transfer function: this is a function of wavenumber and scale factor that describes the connection between the tracer and the power spectrum on different scales and at different cosmic times. An ell-dependent prefactor: normally associated with angular derivatives of a given fundamental quantity. The order of the derivative of the Bessel functions with which they enter the computation of the angular power spectrum.

Sometimes a source may have several tracers because it reflects a combination of different effects for the same kind of measurement.

Currently, we have two implementations of Source: NumberCounts and WeakLensing. The NumberCounts source represents a galaxy number count measurement in a given bin. Since they act as lenses, they are usually labeled as lens sources. The WeakLensing source represents a weak lensing measurement, they result from light emitted by the source galaxies that is lensed by the matter distribution in the Universe. They are usually labeled as src sources.

Systematics objects for Sources have a simple interface: for each source, there is a data class⁹ that has all the necessary information to build the sources and tracers. A source can have a list of systematics. When the source is evaluated, the list of systematics is iterated over, and the apply method of each is called, in order, given the previous value of the source and yielding a new value. If, for example, you have a source for weak lensing, and you want to move the distribution of \(dN/dz\), to apply a bias, this can be done with a systematic.

⁹ A data class is a class that contains mainly data, and which has several methods (such as those for printing, or equality testing) automatically generated by Python.

Firecrown does not currently make a clear distinction between “systematics” that really are systematic effects, and others that are more like modeling choices. As we discussed above, the ModelingTools object handles systematic effects that are related to modeling choices. We should do the same for the Source objects. Then, the systematic effects that are not related to modeling choices can be handled by the Source objects. We are working on improving this.

Development workflow

All the tools provided in Firecrown exist to help you to create an instance of a likelihood for your analysis. The function that is used to create this likelihood is called a factory function. Note that this likelihood function is not creating a new type; it is responsible for creating an instance of the type (e.g. ConstGaussian) you have chosen for your analysis. The purpose of this factory function is to assemble the artifacts representing the data, modeling effects, systematic effects, etc., into a likelihood object, and to return that likelihood object (and the related modeling tools).

We will concentrate here on the workflow for creating a likelihood that uses TwoPoint statistics. Before creating the statistics, we have to create the sources, then the statistics. One typically creates several sources, both weak lensing sources and number count sources.

Once all the sources are created, we create a TwoPoint statistic for each pair of source combinations that we want to use. Naturally, these combinations must be present in the SACC object. One is free to use just a subset of the available combinations.

Note that, each source has a list of systematics that are applied to it. These systematics can be shared between sources, or they can be specific to a given source. When specific to a given source, any parameter name that is used in the systematic is prefixed with the name of the source. In contrast, when a systematic is shared between sources, the parameter names are not prefixed.

Currently, the TwoPoint statistic caches the pyccl.Tracers that are created for each source. This is done to avoid creating the same Tracer multiple times. We are planning to change this behavior in the future in order to avoid the need to cache the Tracers.

Examples in the repository

In the examples directory, we have subdirectories, each of which contains one or more related example uses of Firecrown. These examples are generally configured to run quickly. Thus they generally do not run any real MCMC sampling. In each of the directories, there is a README file that contains a short description of the example and includes directions on how to run it. Some of the examples also include a program to generate additional files needed to run the example.

Currently, all the examples use the ConstGaussian likelihood.

In the cosmicshear directory we have a DES Y1 cosmic shear analysis. This example demonstrates only the use of CosmoSIS with Firecrown. The likelihood function created demonstrates the use of the ConstGaussian likelihood with TwoPoint statistic and the WeakLensing source with a PhotoZShift systematic.
The des_y1_3x2pt directory demonstrates several related likelihoods, each created by a different factory function. This simplest is des)_y1_3x2pt.py. This uses a ConstGaussian likelihood containing a multiplicity of TwoPoint statistics, built from all combinations of several weak lensing sources and several number counts sources. It demonstrates the use of multiple systematics for a source (specifically for weak lensing sources).

The two other likelihoods demonstrate the use of some advanced systematics. Perturbation theory corrections are demonstrated in des_y1_3x2pt_PT.py. TATT corrections are demonstrated in des_y1_3x2pt_TATT.py.

The samples in this directory work with Cobaya, CosmoSIS, and NumCosmo.
The srd_sn directory contains an example of the use of the Supernova statistic. It includes both CosmoSIS and NumCosmo examples. The NumCosmo example demonstrates the construction of a Fisher matrix, using an adaptive algorithm for the calculation of derivatives.
The cluster_number_counts directory contains an example of the use of the ClusterNumberCounts statistic. It includes both CosmoSIS and NumCosmo examples.

Installation modes

The installation methods for Firecrown support two different user roles: developers and non-developers. One is acting as a non-developer when using Firecrown only through installed packages, and making no additions to or modifications of the Firecrown code. One is acting as a developer when either modifying existing code, adding new code, or both. Because Firecrown is under rapid development, we expect most users to be acting in the developer role. That is the role we will discuss in more depth.

Developers require access to the Firecrown source code. They also require access to the complete set of software upon which Firecrown depends, as well as the development tools we use with Firecrown. All the software packages upon which Firecrown and its development environment rely will be installed with conda when possible, or with pip when a conda installation is not possible. The Firecrown source code will be cloned from the repository on GitHub.

Developer installation

The developer installation instructions (below) will:

Clone the Firecrown repository.
Create a Conda environment into which all the packages will be installed. This includes both the packages installed using conda and those that are installed using pip.
Build the CosmoSIS standard library (CSL) for use with Firecrown. The CSL can not, because of licensing issues, be installed with conda. It can be built into an already-existing Conda environment.

This installation only needs to be done once.

Clone the Firecrown repository

Choose a directory in which to work. In this directory, you will be cloning the Firecrown repository and later building some of the non-Firecrown code that is not installable through conda. Note that this is not the directory in which the conda environment is created, nor is it the directory in which the CosmoSIS Standard Library (CSL) will be built.

git clone https://github.com/LSSTDESC/firecrown.git

Installation of dependencies

These instructions will create a new conda environment containing all the packages used in development. This includes testing and code verification tools used during the development process. We use the command conda in these instructions, but you may prefer instead to use mamba. The Mamba version of Conda is typically faster when solving environments, which is done both on installation and during updates of the environment.

We recommend that you execute these commands starting in the same directory as you were in when you cloned the Firecrown repository above. The cosmosis-build-standard-library command below will clone and then build the CosmoSIS Standard Library. We recommend doing this in the directory in which the conda environment resides. We have found this helps to make sure that only one version of the CSL is associated with any development efforts using the associated installation of CosmoSIS. It also makes it easier to keep all of the products in the conda environment consistent when updating is needed. Because the CI system is typically using the newest environment available, developers will periodoically need to update their own development environments.

# conda env update, when run as suggested, is able to create a new environment, as
# well as updating an existing environment.
conda env update -f firecrown/environment.yml
conda activate firecrown_developer
# We define two environment variables that will be defined whenever you activate
# the conda environment.
conda env config vars set CSL_DIR=${CONDA_PREFIX}/cosmosis-standard-library FIRECROWN_DIR=${PWD}/firecrown
# The command above does not immediately defined the environment variables.
# They are made available on every fresh activation of the environment.
# So we have to deactivate and then reactivate...
conda deactivate
conda activate firecrown_developer
# Now we can finish building the CosmoSIS Standard Library.
source ${CONDA_PREFIX}/bin/cosmosis-configure
# We want to put the CSL into the same directory as conda environment upon which it depends
cd ${CONDA_PREFIX}
cosmosis-build-standard-library
# Now change directory into the firecrown repository
cd ${FIRECROWN_DIR}
# And finally make an editable (developer) installation of firecrown into the conda environment
python -m pip install --no-deps --editable ${PWD}

Setting up a shell session for development

These instructions assume you have already done the installation, above, presumably in an earlier shell session. If you have just completed the installation and are in the same shell session, you do not need to execute these commands — you have already done so!

conda activate firecrown_developer
cd ${FIRECROWN_DIR}

Each of the three defined environment variables is used for a different purpose:

CSL_DIR is used in CosmoSIS ini files to allow the cosmosis command to be run from any directory.
FIRECROWN_DIR is used in the examples that come with Firecrown.

Building Firecrown

There are two options for working on the Firecrown code. One is to do an editable installation using python -m pip install --no-deps -e; the other is to directly use the setup.py script. We recommend use of the python -m pip install --no-deps -e .; direct use of the setup.py file is deprecated with recent versions of setuputils.

cd ${FIRECROWN_DIR}
python -m pip install --no-deps -e .

We recommend python -m pip ... rather than direct use of pip ... to help ensure that the pip that is found is the one consistent with the python used by the environment. Note the inclusion of the --no-deps option; this helps make sure that no other packages are accidentally installed. If running this command fails because of a missing dependency, you should install the required produce using conda rather than pip.

Code development hygiene

We use a variety of tools to help improve the quality of the Firecrown code. Note that as of this release, we are still improving and expanding our use of these tools. The continuous integration (CI) system used for Firecrown applies all of these tools automatically and will reject any pull request that fails on one or more of the tools.

Some of the tools we use help to keep the Firecrown code in conformance with the PEP 8¹⁰ style guidelines.

¹⁰ Python Enhance Proposal (PEP) 8 is the official (from the Python development team) style guide for Python code. This style guide is used for code in the Python distribution itself. It can be read at https://peps.python.org/pep-0008/.

¹¹ Black is a PEP 8 compliant opinionated formatter with its own style. Documentation for black is available at https://black.readthedocs.io.

We use black¹¹ as our code formatter. In addition to helping to make the Firecrown code easier to read through consistent formatting, this also makes it easier to understand pull requests, since they will not generally contain changes that only change the formatting. When used with the --check flag, black does not modify the code — it merely reports whether the code layout matches its requirements. To reformat code, run black without the --check flag.

We use flake8¹² to more completely verify PEP 8 compliance. This tool identifies some issues that are not code formatting issues and which are not identified and repaired by black. Two examples are the PEP 8 specified ordering of import statements and identification of unused import statements.

¹² flake8 is a linting tool that helps to identify deviations from the recommended PEP 8 Python coding guidelines. Its documentation is available at https://flake8.pycqa.org.

¹³ Mypy is a static type checker for Python. Documentation for it is found at https://mypy.readthedocs.io.

We are using type annotations in Firecrown for several reasons. They help in the automatic generation of documentation, and when used with a tool like mypy they help make sure the type information in the documentation does not diverge from the code itself. They help many different integrated development environments (IDEs) provide better code completion options. They also can be used by static type checking tools to identify some types of coding error that otherwise could only be identified through exhaustive testing. We strongly recommend that new code added to Firecrown should have appropriate type annotations. We use mypy¹³ as our static type checking tool.

We use pylint¹⁴ to help identify additional categories of errors that are not detected by the other tools.

¹⁴ Pylint is a static code analyzer for Python. Documentation for it is available at https://pylint.readthedocs.io.

¹⁵ The pytest framework makes it easy to write small, readable tests, and can scale to support complex functional testing for applications and libraries. The documentation for pytest is available at https://docs.pytest.org.

We also have unit tests that unfortunately cover only a part of the Firecrown code. We use pytest¹⁵ to run these tests. We are actively working on improving the coverage of the Firecrown unit tests. We strongly recommend that any new code be accompanied by unit tests, in addition to examples of use.

All of these tools are included in the Conda environment created by the development installation instructions.

The following is the set of commands using these tools that are used by the CI system. Since a pull request that fails any of these will be automatically rejected by the CI system, we strongly recommend running them before committing and pushing your code. Note that we have not yet completed the cleanup of the whole Firecrown repository, and so we do not yet apply pylint to all of the code. We strongly recommend that any new code you write should be checked with pylint before it is committed to the repository. We are actively working toward full coverage of the code, and will activate additional checking in the CI system as this work progresses.

black --check firecrown tests examples
flake8 firecrown tests examples
mypy firecrown tests examples
pylint --rcfile tests/pylintrc tests
pylint --rcfile firecrown/models/pylintrc firecrown/models
pylint firecrown/connector firecrown/*.py firecrown/likelihood/*.py \
    firecrown/likelihood/gauss_family/*.py
python -m pytest -v tests