Tuesday A: Run H(x) in a testing environment
============================================

1. Introduction
---------------

This practical session introduces the JEDI Unified Forward Operator
(UFO) code and will teach you how to configure and run a forward
operator. You will then experiment with running operators on several
radiance and conventional instruments. Finally, you will also make a few
plots showing results.

This activity assumes that you have successfully completed Monday's
activities. You should have a working build of the fv3-jedi bundle. You
also should have access to the same academy node as before, either using
the JupyterLab environment (recommended) or SSH.

Access your AWS instance and enter the Singularity container
------------------------------------------------------------

Connect to your assigned compute node. You will use the same method as
yesterday.

You already have the singularity container that contains the JEDI
dependencies. Enter the container using:

.. code:: bash

    cd ~/
    singularity shell -e jedi-gnu-openmpi-dev_latest.sif

Once in the container be sure also to remove limits the stack memory to
prevent spurious failures as noted in yesterday's :doc:`introductory
exercise <getting-started>`.

.. code:: bash

    ulimit -s unlimited
    ulimit -v unlimited

2. Review of YAML structure
---------------------------

Programmers and computers typically store data as complex "objects”
(`structures and
classes <http://www.cplusplus.com/doc/tutorial/structures/>`__). In a
computer's memory, these objects may have very complicated storage
involving pointers, references, dictionaries, and similar constructs.
However, when we need to store these complex structures to a disk or
send them across a network, we have to translate these complex
structures into a series of bytes (a.k.a. we
`serialize <https://en.wikipedia.org/wiki/Serialization>`__ an object
into a `byte stream <https://en.wikipedia.org/wiki/Bitstream>`__).

There are many ways to do this. However, JEDI wanted to employ a
consistent, well-documented format that is easy for people to edit and
for machines to read. So, we chose to use the YAML Ain't Markup Language
(YAML) format to store the configuration data for the JEDI project.

`YAML <https://yaml.org/about.html>`__ was developed in 2001 and has
been implemented for use with `several <https://yaml.org/>`__
programming languages.

Let's take a look at a YAML file for a brief overview.

.. code:: yaml

    ---
    # Comments are indicated with the '#' symbol.
    name: "Your name here" # A string
    a-boolean-value: true
    an-integer-value: 3
    pi: 3.14159
    list-of-some-jedi-components:
      - saber
      - oops
      - ioda
      - ufo
    dictionary-of-places-to-explore-in-a-staycation:
      - local-park:
        scenic: true
        features:
          - "Running trails"
          - Trees
          - "Duck pond"
      - aquarium:
        types-of-animals:
          - jellyfish
          - turtles
          - fish
          free: false
          mask: true
          # TODO: Explore this area and add more details.

The file starts with three dashes. These dashes indicate the start of a
new YAML document. YAML supports multiple documents, and compliant
parsers will recognize each set of dashes as the beginning of a new one.

Comments are started with a hashtag ("#") and extend to the end of the
line.

Next, we see the construct that makes up most of a typical YAML
document: a key-value pair. "name” is a key that points to a string
value: "Your name here”. YAML allows for several types of values:
strings, integers, floating-point numbers, boolean values, and dates are
all acceptable.

Strings can optionally be enclosed in quotes. Quotes include both single
and double-quotes.

You can also add in arrays/lists. Each element in a list is denoted by
an opening dash.

YAML elements can also be nested. This lets you emulate a group/folder
structure. Nesting is accomplished by adding levels of spaces (no tabs
allowed).

See `this
link <https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html>`__
for more examples.

3. Download and explore sample data
-----------------------------------

Sample data are available for download on our AWS data repository.
Download these files into a dedicated test file directory. Untar the
data archive and CD into it.

.. code:: bash

    mkdir -p ~/jedi/tutorials
    cd ~/jedi/tutorials
    wget https://fv3-jedi-public.s3.amazonaws.com/Academy/1.1.0/tutorial_obs_data.tar
    tar xf tutorial_obs_data.tar
    cd tutorial_obs_data

The ``tutorial_obs_data`` directory contains excerpts of data generated from
previous HofX model runs. Our radiance data are from a November 1, 2020,
12Z model run, and our conventional data are from December 15, 2020,
00Z. There are five subdirectories here, ``crtm``, ``geoval``, ``obs``,
``answers``, and ``aux_files``.

-  The ``obs`` subdirectory contains observations from various
   instruments, such as AMSU-A and ATMS. These observation files are
   stored in IODA's internal file format. Observation files can range in
   size from a few kilobytes to many megabytes. Some files store only a
   few observations, while others may contain millions.
-  The ``geoval`` directory contains model state information that has
   been interpolated to the observation locations. GeoVaLs are
   "Geophysical Values at Locations." In an ordinary JEDI run, we
   generate our own GeoVaLs in memory by consulting the model, but to
   save time in this practical we prepopulate our data from a previous
   invocation of JEDI.
-  The ``crtm`` directory contains CRTM coefficient data used by the
   radiative transfer model.
-  The ``answers`` directory contains "hints" in case you get stuck when
   writing your YAML files.
-  The ``aux_files`` directory contains auxilliary files for satellite
   biases and lapse rate information.

4. Create a YAML file to run the CRTM operator on AMSU-A data
-------------------------------------------------------------

We have a large number of observations available for radiance
instruments. One of the most common instruments is AMSU-A, which has
flown aboard the Aqua, MetOP-A, MetOP-B, MetOP-C, NOAA 15-19, NOAA 20,
and Suomi-NPP satellites. Let's consider MetOp-C, which launched in
November 2018. The observation file is
``amsua_metop-c_obs_2020110112.nc4``, and the GeoVaLs file is
``amsua_metop-c_geoval_2020110112.nc4``.

We are going to write the YAML that instructs UFO's H(x) testing
application, ``test_ObsOperator.x``, to read the testing file, run CRTM,
and then store its simulated brightness temperatures. We will plot these
simulated brightness temperatures and will also compare them against some
data generated using NOAA's Gridpoint Statistical Interpolation (GSI)
system.

Since we would like to avoid modifying our testing data, first create a
new directory for our experiment.

.. code:: bash

    mkdir -p ~/tutorial_3_experiments
    cd ~/tutorial_3_experiments

Create a new YAML file, and name it ``amsua_metop-c_gfs_HofX.yaml``.

Insert this text into the new YAML file:

.. code:: yaml

    window begin: 2020-11-01T09:00:00Z
    window end: 2020-11-01T15:00:00Z

    observations:
    - obs operator:
        name: CRTM
        Absorbers: [H2O,O3,CO2]
        Clouds: [Water, Ice]
        Cloud_Fraction: 1.0
        obs options:
          inspectProfile: 1
          Sensor_ID: amsua_metop-c
          EndianType: little_endian
          CoefficientPath: /home/ubuntu/jedi/tutorials/tutorial_obs_data/crtm/
      obs space:
        name: amsua_metop-c
        obsdatain:
          obsfile: /home/ubuntu/jedi/tutorials/tutorial_obs_data/obs/amsua_metop-c_obs_2020110112.nc4
        obsdataout:
          obsfile: /home/ubuntu/tutorial_3_experiments/out-amsua_metop-c_obs_2020110112.nc4
        simulated variables: [brightness_temperature]
        channels: 1-15
      geovals:
        filename: /home/ubuntu/jedi/tutorials/tutorial_obs_data/geoval/amsua_metop-c_geoval_2020110112.nc4
      vector ref: GsiHofX
      tolerance: 1.e-7

In a YAML file, indentation is important, so please ensure that your
file looks like this example. Also please ensure that your indents use
spaces instead of tabs.

The different keys and groupings in the YAML file have meaning.

- The first two lines, ``window begin`` and ``window end``, tell IODA the
  bounds of your assimilation window. All observations outside of this
  window are dropped.
- The ``observations:`` line denotes that we are
  specifying a set of observation operators for the application to run.
  For this first example, we are only attempting to run a single
  observation operator. This operator is described on lines 5-14. We are
  invoking the CRTM operator. When CRTM performs its calculations, it will
  assume that the atmosphere has three absorbing gases, water vapor,
  ozone, and carbon dioxide. Water and ice clouds may both exist.
- The ``obs options`` section provides additional information to properly run
  CRTM. Each instrument needs various ancillary data files that contain
  information about the sensor's channels, polarizations, spectral
  response funcitons, and so on. For AMSU-A on MetOp-C, the data are
  stored in a special ``Data/`` directory. The ``amsua_metop-c`` files
  provide appropriate coefficients for our run. Note that occasionally
  there may be more than one set of available coefficients, and CRTM users
  are invited to read the CRTM documenation to determine which
  coefficients are appropriate.
- The ``obs space`` section describes the
  input data that we are using with the operator. The observation data
  file is specified using the ``obsfile`` key in the ``obsdatain``
  section. The results of the operator can optionally be written to a
  file. This occurs when an ``obsdataout`` section appears in the YAML.
  The syntax of ``obsdatain`` and ``obsdataout`` are identical.
- The ``simulated variables`` and ``channels`` sections tell UFO that you want
  to simulate brightness temperatures for instrument channels 1-15.
- The ``geovals`` section provides interpolated model values at the observed
  locations. This is a "shortcut" for the JEDI system to avoid reading
  full model backgrounds, and this is very useful when developing a new
  operator or when incrementally implementing bias correction and quality
  control filters. For the purposes of this practical exercise (i.e. to
  keep runtimes short), we provide geovals files.
- The final two lines (``vector ref`` and ``tolerance``) allow us to specify a final "check"
  in our test application to verify that our simulated results match those
  of another system. In this case, we are matching against GSI's H(x)
  operator and want to ensure that our CRTM calculations match theirs. If
  the reference check is not specified, then no check is performed.

5. Run the test application
---------------------------

The test application is named ``test_ObsOperator.x``. It exists in your
JEDI build directory (``~/jedi/build/bin``). It takes one command-line
argument: the path to your YAML file. You could run the application
directly, but you are processing many AMSU-A observations. These can be
parallelized by running within an MPI environment.

You can execute the program by typing this:

.. code:: bash

    mpiexec -n 4 ~/jedi/build/bin/test_ObsOperator.x ~/tutorial_3_experiments/amsua_metop-c_gfs_HofX.yaml

On the console you will notice a large amount of output. Eventually, the
application should complete. If any errors are indicated (these are
**highlighted in red** on the console), please ask for help to see what
went wrong. Usually, there is a bad file path or a typo in the YAML.

6. Check the results
--------------------

Checking among the diagnostic print statements, you can find out how different the UFO's H(x) (:code:`hofx`) is with respect to the reference set in the YAML, the GSI's H(x) in this case (defined by :code:`vector ref: GsiHofX` in the YAML above). Try to locate the line shown here:

.. code:: bash

    Test     : Vector difference between reference and computed: amsua_metop-c nobs= 136095 Min=-4.49052e-05, Max=4.49696e-05, RMS=5.95927e-06

This line is presenting minimum, maximum and root mean squared differences between the simulated brightness temperature by UFO and GSI. The comparison is being performed considering all 15 channels together (remember that our YAML is set with :code:`channels: 1-15`). This line is also presenting the number of observations (:code:`nobs`). Considering the channels configuration and the number of observations, we can conclude that this test is being performed for 9073 locations with 15 channels each (:math:`9073 * 15 = 136095`).

You may have noticed from the YAML defined and used in the previous
sections that there was an ``obsdataout`` section in it. That section
specifies an ``obsfile`` template name to save the output files of the
run. So, let's change the current directory to the one where those files
are supposed to be saved and check them. On the console, you can change
the directory and list the files there:

.. code:: bash

    cd /home/ubuntu/tutorial_3_experiments
    ls

You are expected to see a list of files similar to the following:

.. code:: bash

    amsua_metop-c_gfs_HofX.yaml
    out-amsua_metop-c_obs_2020110112_0000.nc4
    out-amsua_metop-c_obs_2020110112_0001.nc4
    out-amsua_metop-c_obs_2020110112_0002.nc4
    out-amsua_metop-c_obs_2020110112_0003.nc4

If you recall, the ``obsfile`` template was defined as
``/home/ubuntu/tutorial_3_experiments/out-amsua_metop-c_obs_2020110112.nc4``
for this ``amsua_metop-c`` case. The name of the files that you are
seeing in your console follows that template, but you have four files
following that template with an underscore and a set of numbers appended
to its name (e.g., ``_0000``). This is because you've run your
application using four processor elements and the program distributes
the input file among these four processor elements.

To avoid overwriting files, it's important to create a folder to store
the plots that will be drawn from the information inside these IODA
files. You can do this on the console with the following commands:

.. code:: bash

    mkdir amsua_metop-c
    cd amsua_metop-c

Once inside the folder, let's generate a figure showing some results
from our run. To do this, we need to run the following command:

.. code:: bash

    ~/jedi/tutorials/tutorial_obs_data/script/plot_from_iodav2_hofx.py \
       --hofxfiles ../out-amsua_metop-c_obs_2020110112_NPROC.nc4 \
       --nprocs 4 \
       --window_begin 2020110109 \
       --variable hofx/brightness_temperature_10

The above command will invoke the plotting script passing a list of
arguments, described as below:

- ``--hofxfiles ../out-amsua_metop-c_obs_2020110112_NPROC.nc4``: the
  template for IODA file names (with ``_NPROC`` appended to it)
- ``--nprocs 4``: number of processor elements used to run the application
  (same number of IODA files)
- ``--window_begin 2020110109``: the
  timestamp of the beginning of the window (following the YYYYMMDDHH
  template)
- ``--variable hofx/brightness_temperature_10``: the variable
  to be plotted (in this case the ``hofx`` of ``brightness_temperature``
  for channel 10)

The command above is expected to generate a figure
(``brightness_temperature_10_GsiHofX.png``) showing the spatial
distribution of simulated brightness temperatures from ``amsua_metop-c``
for channel 10. You can view this figure in your JupyterLab environment
by using the file explorer pane on the left side of your web browser
window. Navigate to the ``/tutorial_3_experiments/amsua_metop-c``
directory in the file pane and you should be able to open and view the
output plot.

.. image:: ./images/file_pane_1.png
   :width: 200
   :alt: Open the file explorer pane

.. image:: ./images/file_pane_2.png
   :width: 200
   :alt: Navigate to the folder

.. figure:: ./images/brightness_temperature_10_hofx.png
   :alt: AMSU-A channel 10 H(x) output plot

Similarly, we can generate a figure showing the same quantity that has
been generated previously by GSI. This quantity has been used previously
as a reference in our test when running the application, and it's stored
in the IODA files named ``GsiHofX``. To create the figure we need to
run again the plotting script with slight different arguments:

.. code:: bash

    ~/jedi/tutorials/tutorial_obs_data/script/plot_from_iodav2_hofx.py \
       --hofxfiles ../out-amsua_metop-c_obs_2020110112_NPROC.nc4 \
       --nprocs 4 \
       --window_begin 2020110109 \
       --variable GsiHofX/brightness_temperature_10

To view this figure in JupyterLab, you may need to first refresh the
file pane.

.. image:: ./images/file_pane_3.png
   :width: 200
   :alt: Refresh button

.. figure:: ./images/brightness_temperature_10_gsihofx.png
   :alt: AMSU-A channel 10 GSI H(x) output plot

You may have noticed that in the above command we only changed the
variables being plotted (from ``hofx/brightness_temperature_10`` to
``GsiHofX/brightness_temperature_10``). A first look into this newly
generated figure for GSI reveals to be very similar to the previously
generated for JEDI. They are qualitatively identical, but how different
they are quantitatively? We can generate another figure presenting it
using the following command:

.. code:: bash

    ~/jedi/tutorials/tutorial_obs_data/script/plot_from_iodav2_hofx.py \
       --hofxfiles ../out-amsua_metop-c_obs_2020110112_NPROC.nc4 \
       --nprocs 4 \
       --window_begin 2020110109 \
       --variable GsiHofX/brightness_temperature_10 \
       --jediminusgsi True

.. figure:: ./images/brightness_temperature_10_hofx_gsihofx.png
   :alt: AMSU-A channel 10 JEDI H(x) minus GSI H(x) output plot

The above command is almost identical to the one that we've used to
generate the figure for GSI, with the exception that an additional
argument (``--jediminusgsi True``) has been passed to enable the
plotting script to plot the difference between JEDI minus GSI for the
simulated brightness temperature of channel 10. You can return to this
section later and explore these plots for other channels if you have
time.

7. Run the CRTM operator on ATMS data
-------------------------------------

The Advanced Technology Microwave Sounder (ATMS) instrument is a newer
instrument that flies aboard Suomi-NPP and NOAA-20 (and eventually
NOAA-21). ATMS combines the capabilities of both microwave temperature
sounders (like AMSU-A) with microwave humidity sounders (MHS) into the
same package. It has 22 channels from 23 GHz to 183 GHz.

Try going through the previous YAML configuration & program execution
steps, but this time run the operator on ATMS data instead of AMSU-A
data. You will need to:

- create a new YAML file / make a fresh copy of
  the AMSU-A file from before,
- update the CRTM YAML block to use ATMS coefficients,
- update the channel numbers,
- provide appropriate paths to the ATMS obs and geovals files, and
- specify a path for the output data.

Each of the files that you will need is in the same directories as
their AMSU-A equivalents.

Run the ``test_ObsOperator.x`` application with your new YAML file. Have
fun making plots of the output.

You can also experiment with plotting different channels.

- Many of the lower ATMS channels are particularly good at detecting the Earth's
  surface, and these channels show pronounced differences between land,
  ocean, and ice. [ :download:`Example image <./images/atms_ch2.png>` ]
- ATMS channel 6 shows cross-swath bias effects. Bias correction will be
  discussed in this afternoon's tutorial. [ :download:`Example image <./images/atms_ch6.png>` ]
- ATMS channels 17 and 18 (165.5 and 183.31 GHz, respectively) are particularly sensitive to clouds.
  [ :download:`Example image 1 <./images/atms_ch17.png>` ] [ :download:`Example image 2 <./images/atms_ch18.png>` ]

8. Run a conventional operator
------------------------------

There are many observation operators available within JEDI.

An important observation operator often used for conventional
observations is the vertical interpolation operator. This operator is
named inside UFO as ``VertInterp`` and it performs a linear vertical
interpolation according to a given vertical coordinate. An example of
its usage is when we want to simulate horizontal wind components
obtained through satellites --- the so-called ``satwinds``. To be
specific, these ``satwinds`` are referred to hereto as horizontal wind
components obtained through the Atmospheric Motion Vectors (AMV) technique, which
essentially derives these wind components identifying the movement of
multiple patterns in a sequence of satellite images. It's important to
mention that this operator performs its vertical interpolation in
logarithmic space when the vertical coordinate is pressure, which is the
case for satellite winds here.

For a final exercise, try running the ``VertInterp`` operator on a small
subset of our satwinds data.

Examine and use the following YAML. The ``obs operator`` section and
``simulated variables`` lines are subtly different from when we invoked
CRTM, but the overall structure is the same.

.. code:: yaml

    window begin: 2020-12-14T21:00:00Z
    window end: 2020-12-15T03:00:00Z

    observations:
    - obs operator:
        name: VertInterp
      obs space:
        name: Satwind
        obsdatain:
          obsfile: /home/ubuntu/jedi/tutorials/tutorial_obs_data/obs/satwind_obs_2020121500_m.nc
        obsdataout:
          obsfile: /home/ubuntu/tutorial_3_experiments/out-satwind_obs_2020121500_m.nc
        simulated variables: [eastward_wind, northward_wind]
      geovals:
        filename: /home/ubuntu/jedi/tutorials/tutorial_obs_data/geoval/satwind_geoval_2020121500_m.nc
      vector ref: GsiHofX
      tolerance: 1.0e-02

After running the YAML, generate plots of the ``eastward_wind`` and
``northward_wind`` variables. You can also make plots of observations
minus background (O-B). Note the ``colmin`` and ``colmax`` options: they set the range of the colorbar
to sensible values.

.. code:: bash

    mkdir -p ~/tutorial_3_experiments/satwind
    cd ~/tutorial_3_experiments/satwind

    ~/jedi/tutorials/tutorial_obs_data/script/plot_from_iodav2_hofx.py --hofxfiles ../out-satwind_obs_2020121500_m_NPROC.nc --nprocs 4 --window_begin 2020121421 --colmin -45 --colmax 45 --variable hofx/northward_wind
    ~/jedi/tutorials/tutorial_obs_data/script/plot_from_iodav2_hofx.py --hofxfiles ../out-satwind_obs_2020121500_m_NPROC.nc --nprocs 4 --window_begin 2020121421 --colmin -45 --colmax 45 --variable hofx/eastward_wind

    ~/jedi/tutorials/tutorial_obs_data/script/plot_from_iodav2_hofx.py --hofxfiles ../out-satwind_obs_2020121500_m_NPROC.nc --nprocs 4 --window_begin 2020121421 --variable hofx/northward_wind --omb true
    ~/jedi/tutorials/tutorial_obs_data/script/plot_from_iodav2_hofx.py --hofxfiles ../out-satwind_obs_2020121500_m_NPROC.nc --nprocs 4 --window_begin 2020121421 --variable hofx/eastward_wind --omb true

.. figure:: ./images/satwind_eastward.png
   :alt: Eastward wind

.. figure:: ./images/satwind_northward.png
   :alt: Northward wind