.. _pipeline_readme:

pyPipe3D's pipeline of IFS data cube analysis
=============================================

.. contents::

.. _pipeline_readme_present:

Presentation
------------

This document presents the sequence of procedures adopted by the **pyPipe3D** pipeline, specially suited for the analysis of a galaxy *IFS data cube*. The main products of the analysis comprises the maps of the stellar populations and the ionized gas properties. The pipeline is standardized, which requires the input data to be pre-processed and some configuration files to be edited depending on the required analysis.

Cube and row-stacked spectra (*RSS*) files
''''''''''''''''''''''''''''''''''''''''''

We define a *"data cube"* as a 3D FITS file with the 2 first dimensions related to the spatial distribution of the pixels *(spaxels; X,Y)*, and a 3rd direction comprising the wavelength information of the IFU data *(Z)*. The *data cubes*, stored in this way, comprises data with the same spatial step in the *X* and *Y* direction (normally in *arcsec*), and the same step in wavelength in the *Z*-direction (normally Angstroms).

The WCS information is stored in the corresponding header entries::

  CRPIX1  => Pixel of the starting position in RA (or X)
  CRVAL1  => Starting position in RA (or X)
  CDELT1  => Spatial extension of a pixel in the X-direction (in arcsec normally)
  CRPIX2  => Pixel of the starting position in DEC (or Y)
  CRVAL2  => Starting position in DEC (or Y)
  CDELT2  => Spatial extension of a pixel in the Y-direction (in arcsec normally)
  CRPIX3  => Pixel of the starting wavelength
  CRVAL3  => Starting Wavelength in AA
  CDELT3  => Wavelength step in AA per pixel

We defined *"RSS"* file or *"Row Stacked Spectrum"* a 2D FITS file comprising *NY*-individual spectra of the same length *(NX)*, with a normalized wavelength step and initial wavelength defined by the header keywords::

  CRPIX1  => Pixel of the starting wavelength
  CRVAL1  => Starting Wavelength in AA
  CDELT1  => Wavelength step in AA per pixel

.. _pipeline_readme_ssplibfits:

Format of the SSP library template FITS file
''''''''''''''''''''''''''''''''''''''''''''

The template libraries used by **pyFIT3D** are *RSS*, for which in each row we store a single stellar population. In **pyFIT3D** they are read using the class `SSPModels <pyFIT3D.modelling.html#pyFIT3D.modelling.stellar.SSPModels>`_. The *SSP* spectra should have a common reference wavelength calibration, with the same step in AA per pixel and the same staring wavelength defined by the header entries::

  CRPIX1
  CRVAL1
  CDELT1

And for each *SSP*, it is required a *NAME* header keyword, with the following format::

  NAME0 | spec_ssp_00.0891_z0004.spec
  NAME1 | spec_ssp_00.0891_z019.spec
  NAME2 | spec_ssp_00.0891_z030.spec
  NAME3 | spec_ssp_00.4467_z0004.spec

i.e., ``spec_ssp_AGE_zMET.spec``, where AGE should be the *SSP* age in *Gyrs*, and *MET* is the metallicity expressed as the *decimals*, ie.,

::

  Z/H = 0.0004  should be 0004
  Z/H = 0.02    should be 02

These entries are mandatory to allow the pipeline to understand the inputs.

The normalization wavelength of the models also could be configured in the header under the keyword `WAVENORM`. If the key `WAVENORM` does not exists in the header, the class will sweep all the models looking for the wavelengths where the flux is closer to 1, calculates the median of those wavelengths and return it.

* `Click here for a series of examples of SSP templates that can be used in pyPipe3D <http://ifs.astroscu.unam.mx/pyPipe3D/templates/>`_

Notes on the input data preprocessing
'''''''''''''''''''''''''''''''''''''

*TODO: MISSING DETAILS*

.. _pipeline_readme_procedure:

Pipeline procedure of analysis
------------------------------

The pipeline script of analysis is `ana_single.py <https://gitlab.com/pipe3d/pyPipe3D/-/blob/master/bin/ana_single.py>`_. When called, the script will run and produce all the files at the running directory. Then, at the end of the complete analysis, the pipeline script will move everything to the configured directories.

The analysis begins with the read of a single configuration input file, however, other input files are needed along the procedure (e.g. *SSP* template libraries, configuration files with the emission lines systems to be analyzed, output data to be stored on final *data cubes*, masked wavelengths during the analysis, etc). **pyPipe3D** repository includes a complete example of a data cube, with all needed configuration files. This example is described at ":ref:`pipeline_readme_example`" Section.

Finally, the pipeline analysis proceed as follows:

1. Read the data cube (object spectra and FITS header). Error spectra, bad pixels and bad spaxels masks are also available.
2. Build the signal, noise and SN maps.
3. Create a slice (2D map) of the data cube (3D) representing the *V-Band*.
4. Create the spatial mask of the field-of-view.
5. Extract central and integrated spectrum.
6. Define the redshift range of the analysis.
7. Main **pyFIT3D** analysis of the central and integrated spectra.
8. CS-binning (adopted FoV segmentation; *RSS* creation).
9. First round of **pyFIT3D** analysis of the binned data (row-stacked spectra, *RSS*). *NOTE*: This analysis will produce the ionized gas modelled spectra and the estimation of the non-linear parameters (redshift, velocity dispersion and dust attenuation) of the stellar populations.
10. Create the ionized gas data cube and *RSS*.
11. Second round of **pyFIT3D** analysis of the binned data. *NOTE*: This analysis is performed over the gas-free spectrum (residual from the difference between the observed and the gas model spectrum). Both the non-linear and emission lines fit steps are not performed during this procedure (values are input from **step 9** analysis).
12. Stellar absorption indices analysis. *NOTE*: Measurement of certain stellar absorption strength indices, such as the Lick index system.
13. Create all maps of **pyFIT3D** analysis of the binned data.
14. **pyFIT3D** emission lines analysis of the ionized gas cube created at **step 10**.
15. Gzip all fits files (*.fits -> .fits.gz*).
16. Moment analysis of the ionized gas cube created at **step 10**.
17. Create the Mass, Age, Metallicity and *SFH* maps.
18. Create final product *data cubes*.
19. Move all files to output directories.

.. _pipeline_readme_outfiles:

Output files
------------

The main *data cubes* produced by the pipeline are divided in five FITS files. The information stored in three of them (*SSP*, *SFH* and *ELINES*) is gather from the maps output from the **steps 3, 4, 8, 9, 11** and **14**. Which maps (output files) are stored in which of them depends on the :ref:`pack configuration files <pack_config_file>`. The other two are directly output by *steps 12* (indices) and *16* (flux_elines). The calculated indices are hard coded in pyFIT3D, however, in the case of the latter, a list of emission lines could be configured by the user.

Main *data cubes*
'''''''''''''''''

These are the five main *data cubes* produced by **pyPipe3D** pipeline:

1. ``NAME.SSP.cube.fits.gz``: Data output from **steps 3**, **4**, **8**, **9** and **11**.

  * Main parameters derived from the analysis of the stellar populations, including the LW and MW ages and metallicities, dust attenuation and kinematics of the stellar populations.

2. ``NAME.SFH.cube.fits.gz``: Data output from **step 11**.

  * Weights of the decomposition of the stellar population for the adopted *SSPs* templates library. It can be used to derive the spatial resolved star formation and chemical enrichment histories of the galaxies and the LW and MW properties included in the *SSP* dataproducts.

3. ``NAME.ELINES.cube.fits.gz``: Data output from **step 14**.

  * Flux intensities for the nine stronger emission lines in the optical wavelength range, together with the kinematics properties of :math:`{\rm H}\alpha`, derived based on a *Gaussian* fitting of each emission line.

4. ``indices.NAME.cube.fits.gz``: Data output from **step 12**.

  * Set of stellar absorption indices maps.

5. ``flux_elines.NAME.cube.fits.gz``: Data output from **step 16**.

  * Maps of the main parameters of a set of more than 50 emission lines derived using a weighted moment analysis of the ionized gas spectra. This analysis depends on the kinematics of :math:`{\rm H}\alpha` derived at **step 14**.

Other output files
''''''''''''''''''

*TODO: MISSING DETAILS*

.. _pipeline_readme_example:

Example of the pipeline analysis
--------------------------------

We prepare an example run of a galaxy IFS data cube analysis placed at the sub-directory ``examples/IFS_analysis``. The input data is the CALIFA IFS V500 data cube for the galaxy NGC 2916. Below we show the list of needed ancillary data for the analysis of the cube together with a brief file description:

.. code-block:: console

  pyFIT3D/examples/IFS_analysis
  ├── ana_single_example.ini                 => input configuration file
  ├── config                                 => ancillary configuration files directory
  │   ├── auto_ssp_V500_no_lines.config      => auto_ssp (pyFIT3D) configuration file (no emission lines to be fitted)
  │   ├── auto_ssp_V500_several.config       => auto_ssp (pyFIT3D) configuration file
  │   ├── auto_ssp_V500_several_SII_z.config => auto_ssp (pyFIT3D) configuration file
  │   ├── cont_V500.config                   => emission lines configuration file without emission lines (only continuum)
  │   ├── emission_lines.LIST                => emission lines list for the moment analysis
  │   ├── emission_lines.txt                 => emission lines to be masked during pyFIT3D analysis
  │   ├── Ha_SII_V500.config                 => [NII]+Ha+[SII] emission lines system configuration file
  │   ├── Ha_V500.config                     => [NII]+Ha emission lines system configuration file
  │   ├── Hd_V500.config                     => Hd emission lines system configuration file
  │   ├── Hg_V500.config                     => Hg emission lines system configuration file
  │   ├── mask_elines.txt                    => wavelengths intervals masked during pyFIT3D analysis
  │   ├── OIII_V500.config                   => [OIII]+Hb emission lines system configuration file
  │   ├── OII_V500.config                    => [OII] emission lines system configuration file
  │   ├── pack_CS_inst_disp.csv              => pack configuration file for SSP data cube.
  │   ├── pack_elines_v1.5.csv               => pack configuration file for ELINES data cube.
  │   ├── pack_SFH.csv                       => pack configuration file for SFH data cube.
  │   ├── SII_V500.config                    => [SII] emission lines system configuration file
  │   └── slice_V.conf                       => V-band slice
  ├── data                                   => input data cube directory
  │   └── NGC2916.V500.rscube.fits.gz        => input data cube FITS file
  ├── README.txt                             => This file.
  └── ssp                                    => input SSP templates for the analysis
      ├── gsd01_12.fits                      => GSD (12 templates - 3 ages, 4 metallicity)
      ├── gsd01_156.fits                     => GSD (156 templates - 39 ages, 4 metallicities)
      └── miles_2_gas.fits                   => MILES (6 templates - 3 ages, 3 metallicities)

Configuration files
'''''''''''''''''''

The present version of **pyFIT3D** is designed to old configuration files and scripts continue running. Some inputs are not needed anymore, although they are required for the configuration parser at the present version of the code to work. For the future versions, the idea is that those files are reduced to the minimum. For those still needed we will translate their design to be read using the python `configparser <https://docs.python.org/3/library/configparser.html#module-configparser>`_ module. The new pipeline of analysis has been built using this module as the configuration file parser (see the :ref:`pipeline_run` presented below). Now we follow with the description of the needed input configuration files.

.. _eml_config_file:

Emission lines system configuration file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The emission lines fit algorithm of **pyFIT3D** needs a configuration file which comprises the information of the model to fit. In this particular context, the most relevant ones are **eline** and **poly1d**. Any configuration file should start with a line with four entries. The first one is keep unused and should be "0", the second defines the number of *functions* that comprises the model, the 3rd one comprises the goal :math:`\chi^2` and the 4th one the goal minimum variation of the :math:`\chi^2` between iteration (ie., the converging criteria).

After that line, the first row defines the *function* (e.g., **eline** for the emission lines), and the following 9 rows configure the possible number or input parameters of the considered *function*. For each parameter it is needed to define 5 entries: (1) The *input guess* of the parameter, (2) a *flag* indicating if it is fitted [1] or if it let fixed [0], (3) and (4) the *min and maximum* values allowed for the variation of the parameter, and (5) a possible *link* of this parameter with the same parameter corresponding to other *function*.

If there is no *link* the entry (5) is set to "-1", if not, it indicates the order of the function for which there is a link in the considered configuration function.

There are two possible *links* considered in **pyFIT3D**: (a) additive links and (b) multiplicative links. A *link* is defined as the entry (5) with the number of the *function* in the configuration file to which the actual *function* is *linked* in the considered parameter. Entry (4) should be set to "0" (*additive links*) or "1" (*multiplicative links*). This way, entry (3) becomes the value to be *added* (case a) or *multiplied* (case b) to the *linked function parameter* to define the *parameter of the current function*.

If a function has less than 9 parameters, they have to be included as::

  0	 0	 0	 0	 -1

Until filling 9-rows.

For an **eline** the parameters are the following ones::

  eline
  CENTRAL_WAVELENGTH	 0	 0	 0	 -1
  INTEGRATED_INTENSITY	 1	 -0.1	 1e10	 -1
  SIGMA_OF_THE_GAUSSIAN	 0	 4.0	 4.5	 -1
  SYSTEMIC_VELOCITY	 1	 4200	  7800	 -1
  0	 0	 0	 0	 -1
  0	 0	 0	 0	 -1
  0	 0	 0	 0	 -1
  0	 0	 0	 0	 -1
  0	 0	 0	 0	 -1

For a **poly1d** the parameters are the coefficients of a polynomial function (with the 1st one being a constant).

* :math:`{\rm [NII]+H}\alpha` `configuration file example <https://gitlab.com/pipe3d/pyPipe3D/-/tree/master/examples/IFS_analysis/config/Ha_V500.config>`_.

* `A series of configuration file examples: <http://ifs.astroscu.unam.mx/pyPipe3D/config_files/>`_

.. _auto_ssp_config_file:

Stellar population synthesis configuration file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The fitting of the stellar continuum algorithm in **pyFIT3D** (which can be executed for a single spectrum using the script `ana_spec.py <https://gitlab.com/pipe3d/pyPipe3D/-/tree/master/bin/ana_spec.py>`_) needs a configuration file. It carries the required entries for the **redshift**, **velocity dispersion** and **dust attenuation** (*guess value*, *step* in each variation, *min* value, *max* value), and the entries required to configure the fitting of the emission lines, the *number of systems to fit* (or groups of emission lines), followed for the same number of rows with the definitions required to run the :ref:`emission lines fit <eml_config_file>`: (*START_W END_W MASK_FILE CONFIG_FILE XXX none XXX XXX*). The two last rows are deprecated. The configuration file should have the following format::

  REDSHIFT DELTA_REDSHIFT MIN_REDSHIFT MAX_REDSHIFT XXX XXX XXX XXX MIN_WAVELENGTH_KIN MAX_WAVELENGTH_KIN
  SIGMA DELTA_SIGMA MIN_SIGMA MAX_SIGMA
  AV DELTA_AV MIN_AV MAX_AV
  N_SYSTEMS
  START_W END_W MASK_FILE CONFIG_FILE XXX none XXX XXX
  (2, ..., N_SYSTEMS-1)
  START_W END_W MASK_FILE CONFIG_FILE XXX none XXX XXX
  XXX XXX XXX
  XXX XXX

* `Example of the stellar continuum fit configuration file <https://gitlab.com/pipe3d/pyPipe3D/-/tree/master/examples/IFS_analysis/config/auto_ssp_V500_several.config>`_.

Moment analysis emission lines list
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This file contains a list of the *central wavelengths* of the emission lines analyzed by the moment analysis (**step 16**). It is a simple file comprising the *central wavelength* and a *name* for each emission line, with the following format::

  CENTRAL_WAVELENGTH NAME
  CENTRAL_WAVELENGTH NAME
  (...)
  CENTRAL_WAVELENGTH NAME

* `Example of a emission lines list file <https://gitlab.com/pipe3d/pyPipe3D/-/tree/master/examples/IFS_analysis/config/emission_lines.LIST>`_

.. _pack_config_file:

Pack configuration file
^^^^^^^^^^^^^^^^^^^^^^^

As mentioned before, the content of the output files from the stellar absorption indices analysis (**step 12**) and the moment analysis (**step 16**) hard coded in **pyFIT3D**. However, the other three (*SSP*, *SFH* and *ELINES*) depend on a configuration file gathering the information which will be packed to each of them. The file is a simple `CSV file <https://en.wikipedia.org/wiki/Comma-separated_values>`_ cointaing an **ID**, a **FILE** containing the map, a **DESCRIPTION** of the map, the **TYPE** of the property mapped and the **UNIT** of the property mapped.

* `Example of the SSP maps pack configuration file <https://gitlab.com/pipe3d/pyPipe3D/-/raw/master/examples/IFS_analysis/config/pack_CS_inst_disp.csv>`_.

.. _pipeline_run:

Run example
'''''''''''

The pipeline script usage is:

.. code-block:: console

  $ ana_single.py
  USE: ana_single.py NAME CONFIG_FILE [REDSHIFT] [X0] [Y0]
  CONFIG_FILE is mandatory but defaults to ana_single.ini

The **REDSHIFT** and the central coordinates (**X0** and **Y0**) are optional input information which helps the pipeline analysis. This information could also be configured inside the *INI* configuration file (**CONFIG_FILE**; `ana_single_example.ini <https://gitlab.com/pipe3d/pyPipe3D/-/tree/master/examples/IFS_analysis/ana_single_example.ini>`_).

To proceed with the pipeline analysis example, run, inside ``IFS_analysis`` sub-directory:

.. code-block:: console

  $ pyPipe3D/examples/IFS_analysis> ana_single.py NGC2916 ana_single_example.ini

The example configuration is entirely prepared to run inside independently of any other ancillary file.

Both, the **REDSHIFT** and the central coordinates of this object are already written inside the configuration file. However, a single input configuration file can be used for an entire set of *data cubes*. This way, turn to not be practical to write those values directly inside the configuration file, since they are unique for each object. In order to run the same example forcing an input **REDSHIFT** of *0.012225* and the central coordinates as **X0** = *36.37* and **Y0** = *31.96*, run:

.. code-block:: console

  $ pyPipe3D/examples/IFS_analysis> ana_single.py NGC2916 ana_single_example.ini 0.012225 36.37 31.96

At the end of the example analysis the pipeline will create a directory called ``out`` and a subdirectory called ``out/NGC2916``. All the output files are moved to directory ``out/NGC2916`` and the five final *data cubes* (togheter with a couple of other important resultant files) are also copied to directory ``out``. The names of the output directories and the ancillary configuration directory and files are all configured through the pipeline *INI* configuration file.

.. _pipeline_all_spaxels:

New ALL SPAXELS pipeline scripts
--------------------------------

As one of the new implementations using the suit of tools available with **pyFIT3D** package, we create a `new pipeline script <scripts/ana_single_all_spaxels.py>`_ in which, through a new algorithm, analyzes the entire datacube *spaxel-by-spaxel*. It works with the outputs of the first and second SSP analysis (*steps 11 and 12*, see :ref:`pipeline_readme_procedure`). After the stellar population analysis, the pipeline recalculate the stellar indices (*step 12*), recreates the *ionized gas spectra data cube* (*step 10*) and all *emission gas analysis related steps*.

Reanalyzing binned regions spaxel-by-spaxel (a.k.a. desegmentation analysis)
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

The new algorithm of analysis visit all binned spaxels using the *4 closest neighbors* (*binned regions*) models as its input **SSP library**. For the neighborhood creation it uses `scipy.spatial.KDTree class <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html>`_ for the segmentation bins with depth 4 (*3 closes neighbors + the bin model itself*). For the non-linear analysis (kinematic and dust extinction), the guess values came from the previous output of the bin analysis. In addition, the **SSP library** used is the same for the non-linear analysis at *step 9*. At the end of the analysis, it recreates all datacubes and properties maps. It uses the same *INI* configuration file, but with some new options (**CONFIG_FILE**; `ana_single_all_spaxels_example.ini <scripts/ana_single_all_spaxels_example.ini>`_)

Star Formation History (SFH) calculation
''''''''''''''''''''''''''''''''''''''''

For each analyzed spaxel, we have as input library *4 SSP models*. They are the neighborhood model spectra at rest-frame, calculated using the old SFH, i.e, the coefficients of the original **SSP library** employed at *step 11* analysis (:math:`w^{\rm old}`) and the new 4 weights for this *temporary SSP library* (:math:`w^{\rm new}`). Finally, the new original coefficients for the new output model (:math:`w^{\rm org}`), i.e., the ones linked to the original SSP library used for the analysis are calculated as the dot product between the old and new coefficients, are calculated as:

.. math::

  w_i^{\rm org} = \sum_j w_j^{\rm new} w_{j, i}^{\rm old}

Since :math:`w^{\rm old}` and :math:`w^{\rm new}` are normalized, no new normalization is needed. The error coefficients are calculated as:

.. math::

  \sigma(w)_i^{\rm org} = \sqrt{\sum_j \sigma(w)_j^{\rm new} \sigma(w)_{j, i}^{\rm old}}

.. README created by:
.. Eduardo Alberto Duarte Lacerda
.. mailto: dhubax@gmail.com