pyPipe3D’s pipeline of IFS data cube analysis

Presentation

This document presents the sequence of procedures adopted by the pyPipe3D pipeline, specially suited for the analysis of a galaxy IFS data cube. The main products of the analysis comprises the maps of the stellar populations and the ionized gas properties. The pipeline is standardized, which requires the input data to be pre-processed and some configuration files to be edited depending on the required analysis.

Cube and row-stacked spectra (RSS) files

We define a “data cube” as a 3D FITS file with the 2 first dimensions related to the spatial distribution of the pixels (spaxels; X,Y), and a 3rd direction comprising the wavelength information of the IFU data (Z). The data cubes, stored in this way, comprises data with the same spatial step in the X and Y direction (normally in arcsec), and the same step in wavelength in the Z-direction (normally Angstroms).

The WCS information is stored in the corresponding header entries:

CRPIX1  => Pixel of the starting position in RA (or X)
CRVAL1  => Starting position in RA (or X)
CDELT1  => Spatial extension of a pixel in the X-direction (in arcsec normally)
CRPIX2  => Pixel of the starting position in DEC (or Y)
CRVAL2  => Starting position in DEC (or Y)
CDELT2  => Spatial extension of a pixel in the Y-direction (in arcsec normally)
CRPIX3  => Pixel of the starting wavelength
CRVAL3  => Starting Wavelength in AA
CDELT3  => Wavelength step in AA per pixel

We defined “RSS” file or “Row Stacked Spectrum” a 2D FITS file comprising NY-individual spectra of the same length (NX), with a normalized wavelength step and initial wavelength defined by the header keywords:

CRPIX1  => Pixel of the starting wavelength
CRVAL1  => Starting Wavelength in AA
CDELT1  => Wavelength step in AA per pixel

Format of the SSP library template FITS file

The template libraries used by pyFIT3D are RSS, for which in each row we store a single stellar population. In pyFIT3D they are read using the class SSPModels. The SSP spectra should have a common reference wavelength calibration, with the same step in AA per pixel and the same staring wavelength defined by the header entries:

CRPIX1
CRVAL1
CDELT1

And for each SSP, it is required a NAME header keyword, with the following format:

NAME0 | spec_ssp_00.0891_z0004.spec
NAME1 | spec_ssp_00.0891_z019.spec
NAME2 | spec_ssp_00.0891_z030.spec
NAME3 | spec_ssp_00.4467_z0004.spec

i.e., spec_ssp_AGE_zMET.spec, where AGE should be the SSP age in Gyrs, and MET is the metallicity expressed as the decimals, ie.,

Z/H = 0.0004  should be 0004
Z/H = 0.02    should be 02

These entries are mandatory to allow the pipeline to understand the inputs.

The normalization wavelength of the models also could be configured in the header under the keyword WAVENORM. If the key WAVENORM does not exists in the header, the class will sweep all the models looking for the wavelengths where the flux is closer to 1, calculates the median of those wavelengths and return it.

Notes on the input data preprocessing

TODO: MISSING DETAILS

Pipeline procedure of analysis

The pipeline script of analysis is ana_single.py. When called, the script will run and produce all the files at the running directory. Then, at the end of the complete analysis, the pipeline script will move everything to the configured directories.

The analysis begins with the read of a single configuration input file, however, other input files are needed along the procedure (e.g. SSP template libraries, configuration files with the emission lines systems to be analyzed, output data to be stored on final data cubes, masked wavelengths during the analysis, etc). pyPipe3D repository includes a complete example of a data cube, with all needed configuration files. This example is described at “Example of the pipeline analysis” Section.

Finally, the pipeline analysis proceed as follows:

  1. Read the data cube (object spectra and FITS header). Error spectra, bad pixels and bad spaxels masks are also available.

  2. Build the signal, noise and SN maps.

  3. Create a slice (2D map) of the data cube (3D) representing the V-Band.

  4. Create the spatial mask of the field-of-view.

  5. Extract central and integrated spectrum.

  6. Define the redshift range of the analysis.

  7. Main pyFIT3D analysis of the central and integrated spectra.

  8. CS-binning (adopted FoV segmentation; RSS creation).

  9. First round of pyFIT3D analysis of the binned data (row-stacked spectra, RSS). NOTE: This analysis will produce the ionized gas modelled spectra and the estimation of the non-linear parameters (redshift, velocity dispersion and dust attenuation) of the stellar populations.

  10. Create the ionized gas data cube and RSS.

  11. Second round of pyFIT3D analysis of the binned data. NOTE: This analysis is performed over the gas-free spectrum (residual from the difference between the observed and the gas model spectrum). Both the non-linear and emission lines fit steps are not performed during this procedure (values are input from step 9 analysis).

  12. Stellar absorption indices analysis. NOTE: Measurement of certain stellar absorption strength indices, such as the Lick index system.

  13. Create all maps of pyFIT3D analysis of the binned data.

  14. pyFIT3D emission lines analysis of the ionized gas cube created at step 10.

  15. Gzip all fits files (.fits -> .fits.gz).

  16. Moment analysis of the ionized gas cube created at step 10.

  17. Create the Mass, Age, Metallicity and SFH maps.

  18. Create final product data cubes.

  19. Move all files to output directories.

Output files

The main data cubes produced by the pipeline are divided in five FITS files. The information stored in three of them (SSP, SFH and ELINES) is gather from the maps output from the steps 3, 4, 8, 9, 11 and 14. Which maps (output files) are stored in which of them depends on the pack configuration files. The other two are directly output by steps 12 (indices) and 16 (flux_elines). The calculated indices are hard coded in pyFIT3D, however, in the case of the latter, a list of emission lines could be configured by the user.

Main data cubes

These are the five main data cubes produced by pyPipe3D pipeline:

  1. NAME.SSP.cube.fits.gz: Data output from steps 3, 4, 8, 9 and 11.

  • Main parameters derived from the analysis of the stellar populations, including the LW and MW ages and metallicities, dust attenuation and kinematics of the stellar populations.

  1. NAME.SFH.cube.fits.gz: Data output from step 11.

  • Weights of the decomposition of the stellar population for the adopted SSPs templates library. It can be used to derive the spatial resolved star formation and chemical enrichment histories of the galaxies and the LW and MW properties included in the SSP dataproducts.

  1. NAME.ELINES.cube.fits.gz: Data output from step 14.

  • Flux intensities for the nine stronger emission lines in the optical wavelength range, together with the kinematics properties of \({\rm H}\alpha\), derived based on a Gaussian fitting of each emission line.

  1. indices.NAME.cube.fits.gz: Data output from step 12.

  • Set of stellar absorption indices maps.

  1. flux_elines.NAME.cube.fits.gz: Data output from step 16.

  • Maps of the main parameters of a set of more than 50 emission lines derived using a weighted moment analysis of the ionized gas spectra. This analysis depends on the kinematics of \({\rm H}\alpha\) derived at step 14.

Other output files

TODO: MISSING DETAILS

Example of the pipeline analysis

We prepare an example run of a galaxy IFS data cube analysis placed at the sub-directory examples/IFS_analysis. The input data is the CALIFA IFS V500 data cube for the galaxy NGC 2916. Below we show the list of needed ancillary data for the analysis of the cube together with a brief file description:

pyFIT3D/examples/IFS_analysis
├── ana_single_example.ini                 => input configuration file
├── config                                 => ancillary configuration files directory
│   ├── auto_ssp_V500_no_lines.config      => auto_ssp (pyFIT3D) configuration file (no emission lines to be fitted)
│   ├── auto_ssp_V500_several.config       => auto_ssp (pyFIT3D) configuration file
│   ├── auto_ssp_V500_several_SII_z.config => auto_ssp (pyFIT3D) configuration file
│   ├── cont_V500.config                   => emission lines configuration file without emission lines (only continuum)
│   ├── emission_lines.LIST                => emission lines list for the moment analysis
│   ├── emission_lines.txt                 => emission lines to be masked during pyFIT3D analysis
│   ├── Ha_SII_V500.config                 => [NII]+Ha+[SII] emission lines system configuration file
│   ├── Ha_V500.config                     => [NII]+Ha emission lines system configuration file
│   ├── Hd_V500.config                     => Hd emission lines system configuration file
│   ├── Hg_V500.config                     => Hg emission lines system configuration file
│   ├── mask_elines.txt                    => wavelengths intervals masked during pyFIT3D analysis
│   ├── OIII_V500.config                   => [OIII]+Hb emission lines system configuration file
│   ├── OII_V500.config                    => [OII] emission lines system configuration file
│   ├── pack_CS_inst_disp.csv              => pack configuration file for SSP data cube.
│   ├── pack_elines_v1.5.csv               => pack configuration file for ELINES data cube.
│   ├── pack_SFH.csv                       => pack configuration file for SFH data cube.
│   ├── SII_V500.config                    => [SII] emission lines system configuration file
│   └── slice_V.conf                       => V-band slice
├── data                                   => input data cube directory
│   └── NGC2916.V500.rscube.fits.gz        => input data cube FITS file
├── README.txt                             => This file.
└── ssp                                    => input SSP templates for the analysis
    ├── gsd01_12.fits                      => GSD (12 templates - 3 ages, 4 metallicity)
    ├── gsd01_156.fits                     => GSD (156 templates - 39 ages, 4 metallicities)
    └── miles_2_gas.fits                   => MILES (6 templates - 3 ages, 3 metallicities)

Configuration files

The present version of pyFIT3D is designed to old configuration files and scripts continue running. Some inputs are not needed anymore, although they are required for the configuration parser at the present version of the code to work. For the future versions, the idea is that those files are reduced to the minimum. For those still needed we will translate their design to be read using the python configparser module. The new pipeline of analysis has been built using this module as the configuration file parser (see the Run example presented below). Now we follow with the description of the needed input configuration files.

Emission lines system configuration file

The emission lines fit algorithm of pyFIT3D needs a configuration file which comprises the information of the model to fit. In this particular context, the most relevant ones are eline and poly1d. Any configuration file should start with a line with four entries. The first one is keep unused and should be “0”, the second defines the number of functions that comprises the model, the 3rd one comprises the goal \(\chi^2\) and the 4th one the goal minimum variation of the \(\chi^2\) between iteration (ie., the converging criteria).

After that line, the first row defines the function (e.g., eline for the emission lines), and the following 9 rows configure the possible number or input parameters of the considered function. For each parameter it is needed to define 5 entries: (1) The input guess of the parameter, (2) a flag indicating if it is fitted [1] or if it let fixed [0], (3) and (4) the min and maximum values allowed for the variation of the parameter, and (5) a possible link of this parameter with the same parameter corresponding to other function.

If there is no link the entry (5) is set to “-1”, if not, it indicates the order of the function for which there is a link in the considered configuration function.

There are two possible links considered in pyFIT3D: (a) additive links and (b) multiplicative links. A link is defined as the entry (5) with the number of the function in the configuration file to which the actual function is linked in the considered parameter. Entry (4) should be set to “0” (additive links) or “1” (multiplicative links). This way, entry (3) becomes the value to be added (case a) or multiplied (case b) to the linked function parameter to define the parameter of the current function.

If a function has less than 9 parameters, they have to be included as:

0      0       0       0       -1

Until filling 9-rows.

For an eline the parameters are the following ones:

eline
CENTRAL_WAVELENGTH     0       0       0       -1
INTEGRATED_INTENSITY   1       -0.1    1e10    -1
SIGMA_OF_THE_GAUSSIAN  0       4.0     4.5     -1
SYSTEMIC_VELOCITY      1       4200     7800   -1
0      0       0       0       -1
0      0       0       0       -1
0      0       0       0       -1
0      0       0       0       -1
0      0       0       0       -1

For a poly1d the parameters are the coefficients of a polynomial function (with the 1st one being a constant).

Stellar population synthesis configuration file

The fitting of the stellar continuum algorithm in pyFIT3D (which can be executed for a single spectrum using the script ana_spec.py) needs a configuration file. It carries the required entries for the redshift, velocity dispersion and dust attenuation (guess value, step in each variation, min value, max value), and the entries required to configure the fitting of the emission lines, the number of systems to fit (or groups of emission lines), followed for the same number of rows with the definitions required to run the emission lines fit: (START_W END_W MASK_FILE CONFIG_FILE XXX none XXX XXX). The two last rows are deprecated. The configuration file should have the following format:

REDSHIFT DELTA_REDSHIFT MIN_REDSHIFT MAX_REDSHIFT XXX XXX XXX XXX MIN_WAVELENGTH_KIN MAX_WAVELENGTH_KIN
SIGMA DELTA_SIGMA MIN_SIGMA MAX_SIGMA
AV DELTA_AV MIN_AV MAX_AV
N_SYSTEMS
START_W END_W MASK_FILE CONFIG_FILE XXX none XXX XXX
(2, ..., N_SYSTEMS-1)
START_W END_W MASK_FILE CONFIG_FILE XXX none XXX XXX
XXX XXX XXX
XXX XXX

Moment analysis emission lines list

This file contains a list of the central wavelengths of the emission lines analyzed by the moment analysis (step 16). It is a simple file comprising the central wavelength and a name for each emission line, with the following format:

CENTRAL_WAVELENGTH NAME
CENTRAL_WAVELENGTH NAME
(...)
CENTRAL_WAVELENGTH NAME

Pack configuration file

As mentioned before, the content of the output files from the stellar absorption indices analysis (step 12) and the moment analysis (step 16) hard coded in pyFIT3D. However, the other three (SSP, SFH and ELINES) depend on a configuration file gathering the information which will be packed to each of them. The file is a simple CSV file cointaing an ID, a FILE containing the map, a DESCRIPTION of the map, the TYPE of the property mapped and the UNIT of the property mapped.

Run example

The pipeline script usage is:

$ ana_single.py
USE: ana_single.py NAME CONFIG_FILE [REDSHIFT] [X0] [Y0]
CONFIG_FILE is mandatory but defaults to ana_single.ini

The REDSHIFT and the central coordinates (X0 and Y0) are optional input information which helps the pipeline analysis. This information could also be configured inside the INI configuration file (CONFIG_FILE; ana_single_example.ini).

To proceed with the pipeline analysis example, run, inside IFS_analysis sub-directory:

$ pyPipe3D/examples/IFS_analysis> ana_single.py NGC2916 ana_single_example.ini

The example configuration is entirely prepared to run inside independently of any other ancillary file.

Both, the REDSHIFT and the central coordinates of this object are already written inside the configuration file. However, a single input configuration file can be used for an entire set of data cubes. This way, turn to not be practical to write those values directly inside the configuration file, since they are unique for each object. In order to run the same example forcing an input REDSHIFT of 0.012225 and the central coordinates as X0 = 36.37 and Y0 = 31.96, run:

$ pyPipe3D/examples/IFS_analysis> ana_single.py NGC2916 ana_single_example.ini 0.012225 36.37 31.96

At the end of the example analysis the pipeline will create a directory called out and a subdirectory called out/NGC2916. All the output files are moved to directory out/NGC2916 and the five final data cubes (togheter with a couple of other important resultant files) are also copied to directory out. The names of the output directories and the ancillary configuration directory and files are all configured through the pipeline INI configuration file.

New ALL SPAXELS pipeline scripts

As one of the new implementations using the suit of tools available with pyFIT3D package, we create a new pipeline script in which, through a new algorithm, analyzes the entire datacube spaxel-by-spaxel. It works with the outputs of the first and second SSP analysis (steps 9 and 11, see Pipeline procedure of analysis). After the stellar population analysis, the pipeline recalculate the stellar indices (step 12), recreates the ionized gas spectra data cube (step 10) and all emission gas analysis related steps.

Reanalyzing binned regions spaxel-by-spaxel (a.k.a. desegmentation analysis)

The new algorithm of analysis visit all binned spaxels using the 4 closest neighbors (binned regions) models as its input SSP library. For the neighborhood creation it uses scipy.spatial.KDTree class for the segmentation bins with depth 4 (3 closes neighbors + the bin model itself). For the non-linear analysis (kinematic and dust extinction), the guess values came from the previous output of the bin analysis. In addition, the SSP library used is the same for the non-linear analysis at step 9. At the end of the analysis, it recreates all datacubes and properties maps. It uses the same INI configuration file, but with some new options (CONFIG_FILE; ana_single_all_spaxels_example.ini)

Star Formation History (SFH) calculation

For each analyzed spaxel, we have as input library 4 SSP models. They are the neighborhood model spectra at rest-frame, calculated using the old SFH, i.e, the coefficients of the original SSP library employed at step 11 analysis (\(w^{\rm old}\)) and the new 4 weights for this temporary SSP library (\(w^{\rm new}\)). Finally, the new original coefficients for the new output model (\(w^{\rm org}\)), i.e., the ones linked to the original SSP library used for the analysis are calculated as the dot product between the old and new coefficients:

\[w_i^{\rm org} = \sum_j w_j^{\rm new} w_{j, i}^{\rm old}\]

Since \(w^{\rm old}\) and \(w^{\rm new}\) are normalized, no new normalization is needed. The error coefficients are calculated as:

\[\sigma(w)_i^{\rm org} = \sqrt{\sum_j \sigma(w)_j^{\rm new} \sigma(w)_{j, i}^{\rm old}}\]