Thursday B: Adding QC filter and QC filter test to JEDI

Introduction

This practical is on the JEDI Unified Forward Operator (UFO) code and quality control (QC) filters and tests. It has three main parts. First, you will create a feature branch in the ufo repository. Second, you will add a new simple QC filter to UFO. Third, you will add a test for your new QC filter.

Filters are an essential component in a data assimilation workflow. Filters can change quality control flags (i.e., to reject or retain observations) and observation error variances (e.g., one might wish to increase observation error variances to decrease the observation weight in the analysis instead of rejecting observations altogether). In JEDI, filters are customizable and generic. This means that you can use the same code (written in C++) to accomplish different tasks (specified by you in a YAML file).

This activity assumes that you have completed the previous activities and still have access to a JupyterLab or SSH session.

Step 1: Access your AWS instance and enter the Singularity container

Connect to your assigned compute node. You will use the same method as yesterday.

You already have the singularity container that contains the JEDI dependencies. Enter the container using:

cd ~/
singularity shell -e jedi-gnu-openmpi-dev_latest.sif

Once in the container be sure also to remove limits the stack memory to prevent spurious failures as noted before:

ulimit -s unlimited
ulimit -v unlimited

Step 2: Make a new feature branch in the UFO repository

You invoked ecbuild on Monday’s Getting Started activity. Ecbuild cloned the stable branches of several repositories. However, in this tutorial we want to make modifications to the UFO code. In JEDI, we aim to follow the “git flow” paradigm when developing, and we will discuss this in depth in a later lecture on Friday. In summary, the develop branch contains the development version of each repository. This version of the code should always build and test successfully. Whenever you want to add a new feature to the code, you should do your work in another branch of the repository. Once the work is done, you can issue a “Pull Request” to have other JEDI users review your code and merge in your changes into the develop branch. With every release of the JEDI code, we create a snapshot of JEDI repositories by copying the development branch to a git “tag” (an immutable branch).

The meanings of “master or main”, “develop”, and “tags” will be discussed in the git-flow lecture and later practical exercise. Because of the ordering of the lectures, and because we want stable, reproducible academy exercises, we have a special copy of the UFO repository in the jedi-da-academy on GitHub. In this repository, both the 1.1.0 tag and develop branch are identical. This is not the case in actual development. Ordinarily, the top-level CMakeLists.txt file would not reference TAG 1.1.0 when describing each package and instead would reference BRANCH develop.

Open the top-level CMakeLists.txt file in the source code (~/jedi/fv3-bundle/CMakeLists.txt). Change line 49 from:

ecbuild_bundle( PROJECT ufo   GIT "https://github.com/jedi-da-academy/ufo.git"   TAG 1.1.0 )

to:

ecbuild_bundle( PROJECT ufo   GIT "https://github.com/jcsda-da-academy/ufo.git"  BRANCH feature/new_qc_test_<yourname> )

Then, enter the source code’s ufo subdirectory (cd ~/jedi/fv3-bundle/ufo). NOTE: There is also a ufo directory in your current directory at ~/jedi/build/ufo <– This is not the directory that you want.)

Checkout the develop branch and then create a new branch as follows:

git checkout develop
git checkout -b feature/new_qc_test_<yourname>

The -b option to git checkout creates the branch by effectively making a copy of the develop branch.

Don’t forget to set LOCAL_PATH_JEDI_TESTFILES otherwise the test data will not be linked correctly. This step is only needed for the practical sessions. In other cases, cmake will download and link the correct version of test data.

export LOCAL_PATH_JEDI_TESTFILES=$HOME/jedi/test-data-release

Step 3: Add a new filter

We are going to re-implement a simplified version of the Bounds Check filter. This filter checks that observation data are within certain user-specified bounds. You can refer to JEDI documentation for more details about creating a new filter.

Step 3a: The backend logic

Navigate into the ~/jedi/fv3-bundle/ufo/src/ufo/filters directory. Copy the DifferenceCheck.cc and DifferenceCheck.h files to PracticalBoundsCheck.cc and PracticalBoundsCheck.h, respectively.

Open these files in your editor of choice.

In PracticalBoundsCheck.h:

  • Rename all references of DifferenceCheck to PracticalBoundsCheck. Search for all possible capitalizations. Don’t forget the capitalized text on lines 8, 9, and 87!

  • Change the line int qcFlag() const override {return QCflags::diffref;} to return a different flag: QCflags::bounds. This QC flag is conveniently already defined in ufo/filters/QCflags.h.

  • Remove lines with ref and val parameters. We do not use them in this filter.

In PracticalBoundsCheck.cc:

  • Rename all references of DifferenceCheck to PracticalBoundsCheck.

  • Remove lines with ref and val parameters. We do not use them in this filter.

  • In PracticalBoundsCheck::applyFilter(...), replace the function body with something like this:

  ufo::Variables testvars;
  testvars += ufo::Variables(filtervars, "ObsValue");

  // Retrieve the bounds.
  const float missing = util::missingValue(missing);
  const float vmin = parameters_.minvalue.value().value_or(missing);
  const float vmax = parameters_.maxvalue.value().value_or(missing);

// Sanity checks
  if (filtervars.nvars() == 0) {
    oops::Log::error() << "No variables will be filtered out in filter "
                       << config_ << std::endl;
    ABORT("No variables specified to be filtered out in filter");
  }

// Loop over all variables to filter
    for (size_t jv = 0; jv < testvars.nvars(); ++jv) {
      //  get test data for this variable
      std::vector<float> testdata;
      data_.get(testvars.variable(jv), testdata);
      //  apply the filter
      for (size_t jobs = 0; jobs < obsdb_.nlocs(); ++jobs) {
        if (apply[jobs]) {
          ASSERT(testdata[jobs] != missing);
          if (vmin != missing && testdata[jobs] < vmin) flagged[jv][jobs] = true;
          if (vmax != missing && testdata[jobs] > vmax) flagged[jv][jobs] = true;
        }
      }
    }
  • Feel free to customize the function further.

Step 3b: Add your new filter to the build system

  • Edit ~/jedi/fv3-bundle/ufo/src/ufo/filters/CMakeLists.txt and add in PracticalBoundsCheck.cc and PracticalBoundsCheck.h to filters_files.

  • UFO needs to be told that another filter is available. The list of known filters is located in ~/jedi/fv3-bundle/ufo/src/ufo/instantiateObsFilterFactory.h.

    To add in the new filter, first add #include "ufo/filters/PracticalBoundsCheck.h" to the top of instantiateObsFilterFactory.h.

    At the end of instantiateObsFilterFactory.h, follow the pattern and add in:

    static oops::FilterMaker<OBS, oops::ObsFilter<OBS, ufo::PracticalBoundsCheck> >
             practicalBoundsCheckMaker("Practical Bounds Check");
    
  • The filter is added!

Step 4: Compile your code

Finally, return to the build directory ($HOME/jedi/build-release) and run ecbuild again:

To remove the Perl warnings at the ecbuild stage you can set LANG=C.

LANG=C
cd $HOME/jedi/build-release
ecbuild ../fv3-bundle

We want to re-run ecbuild because we added source code files to UFO.

Once ecbuild completes, verify that it reports that configuration has succeeded. If the configuration step has succeeded you should see a line like this:

-- Build files have been written to: /home/ubuntu/jedi/build-release

Now that you have modified the ufo source code, recompile it. To save a little time, you can go directly to the ufo directory and just compile that:

cd $HOME/jedi/build-release/ufo
make -j8

If an error is reported, review the console to see what went wrong. If you do not know what to fix, please ask for help.

Once the build succeeds, you need to run ctests from the UFO directory and ensure all tests pass. Step 5 provides more details about testing in JEDI.

Step 5: Testing in JEDI

Each JEDI repository has its own suite of tests. In this step, we introduce some of the ctest commands that can help you test and debug your code. Please refer to (JEDI documentation) for more information on the JEDI test suite. After building and compiling the bundle, you can run the tests using ctest.

cd <build-directory>
ctest

Here <build-directory> is $HOME/jedi/build-release. To only run tests in UFO you can simply CD into ufo and run ctest command.

cd $HOME/jedi/build-release/ufo
ctest

After the tests are complete, ctest will print out a summary, highlighting which tests, if any, failed. To run a single test, you can use -R followed by the test’s name, for example:

ctest -R ufo_coding_norms

The output from these tests will be printed to the screen and written to the file LastTest.log in the directory <build-directory>/Testing/Temporary or this example $HOME/jedi/build-release/ufo/Testing/Temporary. In the same directory LastTestsFailed.log lists the last tests that failed. You can run ctest with the verbose option to get more information which can be helpful for debugging.

ctest -V -R test_ufo_geovals

and for extra verbose:

ctest -VV -R test_ufo_geovals

ctest also has an option to only re-run the tests that failed last time:

ctest --rerun-failed

A note on the ufo_coding_norms test

This test runs cpplint, which is a command-line tool to check C/C++ files for style issues following Google’s C++ style guide. We use several rules in this style guide to ensure that code that we write is readable by other people.

If you see an error in the ufo_coding_norms test, this indicates that the style checker has detected an issue. To view the output of a failed ufo_coding_norms test, run:

ctest -V -R ufo_coding_norms

Then, apply any fixes to your code, rerun make -j8, and run the test again.

Keep in mind that when you add a new feature to the JEDI repository you need to write a test for your code. This way you ensure your code is working properly and it will help us review and merge your code quicker. You will add a test to your new filter in the next section of this practical.

Step 6: Add YAML configuration file for the new test

We will use filters_testdata.nc4, a simplified IODA format file, for testing our new filter. Take a look at this dataset by using h5dump command.

cd <build-directory>/ufo/test/Data/ufo/testinput_tier_1
h5dump filters_testdata.nc4 | less

Here <build-directory> is $HOME/jedi/build-release.

To test your filter you need to first add YAML configuration file in your source directory $HOME/jedi/fv3-bundle/ufo/test/testinput. In this directory, YAML configuration files with the prefix qc_ are used for testing various filters in UFO. In the YAML configuration file, you can specify the details of how you want to test your filter. For example, the name of the file and the list of variables in the file you want to apply the filter on.

Create a new YAML file in $HOME/jedi/fv3-bundle/ufo/test/testinput called qc_practical_boundscheck.yaml. Copy and paste this to your YAML file. If you are using the vim editor, it may be helpful to open the editor and immediately type :set paste so the indentation shown below should be kept the same.

window begin: 2018-01-01T00:00:00Z
window end: 2019-01-01T00:00:00Z

observations:
- obs space:
    name: test data
    obsdatain:
      obsfile: Data/ufo/testinput_tier_1/filters_testdata.nc4
    simulated variables: [variable1, variable2, variable3]
  obs filters:
  - filter: Practical Bounds Check        # test min/max value with all variables
    filter variables:
    - name: variable1
    - name: variable2
    - name: variable3
    minvalue: 14.0
    maxvalue: 19.0
# Compare variables with minvalue/maxvalue
#  variable1@ObsValue = 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
#  variable2@ObsValue = 10, 12, 14, 16, 18, 20, 22, 24, 26, 28
#  variable3@ObsValue = 25, 24, 23, 22, 21, 20, 19, 18, 17, 16
  passedBenchmark: 13

The test will pass when the number of data points that pass the filter is equal to the passedBenchmark value. The developer of the test is responsible for finding the correct passedBenchmark value. You can determine this number by examining obsfile, in this case:

h5dump <build-directory>/ufo/test/Data/ufo/testinput_tier_1/filters_testdata.nc4 | less

You can define multiple mini-tests for your filter in one YAML configuration file. Now add a new test to filter out data points with ObsValues greater than 15.0 and less than 20.0 only for variable2 and variable3 using your new Practical Bounds Check filter. Notice that all data points in variable1 will pass because variable1 is not specified in this test. You can copy the obs filters section from the previous test and modify it. Or you can simply use the template below to add this test.

- obs space:
    name: test data
    obsdatain:
      obsfile: Data/ufo/testinput_tier_1/filters_testdata.nc4
    simulated variables: [variable1, variable2, variable3]
  obs filters:
  - filter: ...        # test min/max value with all variables
    filter variables:
    - name: ...
    - name: ...
    minvalue: ...
    maxvalue: ...
# Compare variables with minvalue/maxvalue
#  variable1@ObsValue = 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
#  variable2@ObsValue = 10, 12, 14, 16, 18, 20, 22, 24, 26, 28
#  variable3@ObsValue = 25, 24, 23, 22, 21, 20, 19, 18, 17, 16
  passedBenchmark: ...

Note that test/testinput directory exists in both source and build directories. CD to $HOME/jedi/build-release/ufo/test/testinput and then execute the command ln -l. You can see that YAML files in the build directory are linked to the YAML files in the source directory. So, you can edit the YAML file either in the build or source directory, both would work!

Step 7: Register your test to CMakeLists.txt

Now you need to register your new test to CMake by adding it to $HOME/jedi/fv3-bundle/ufo/test/CMakeLists.txt. First, add your YAML configuration file to ufo_test_input list. Next, under Test UFO ObsFilters (generic) section add your test using ecbuild_add_test command.

ecbuild_add_test( TARGET  test_ufo_qc_gen_practical_boundscheck
                  COMMAND ${CMAKE_BINARY_DIR}/bin/test_ObsFilters.x
                  ARGS    "testinput/qc_practical_boundscheck.yaml"
                  ENVIRONMENT OOPS_TRAPFPE=1
                  DEPENDS test_ObsFilters.x
                  TEST_DEPENDS ufo_get_ufo_test_data )

Step 8: Run your new test

Now you are ready to test your filter! Don’t forget to rebuild UFO first. To rebuild UFO with the new changes you need to enter <build-directory>/ufo and simply run the command make -j8. Next, you can list all the UFO tests using ctest -N or ctest -N -R practical. Can you find your new test on the list? Now run your test using:

ctest -R name_of_your_test

Did your test pass? When writing a new test, it is always a good idea to also test failure conditions. Modify your YAML configuration file to make your test fail. You do not need to rebuild the bundle if you are only making changes to the YAML files. You can simply rerun your test after modifying the YAML file. Run your test in verbose mode to see the detailed output.

ctest -VV -R name_of_your_test

Did your test fail as expected? Don’t forget to change your YAML file back to the passing condition. You can add more tests to your YAML configuration file to make your new filter robust.

Execute the command `` ls -al`` in <build>/ufo/test/testpinput

You can find the solution for this practical under feature/new_qc_test_solution branch in jedi-da-academy/ufo repository: https://github.com/jedi-da-academy/ufo/tree/feature/new_qc_test_solution