Skip to content

Feature/regional/condense mesoscale#970

Open
MarcelCaron-NOAA wants to merge 21 commits into
NOAA-EMC:feature/rm_nam_hiresw_sreffrom
MarcelCaron-NOAA:feature/regional/condense_mesoscale
Open

Feature/regional/condense mesoscale#970
MarcelCaron-NOAA wants to merge 21 commits into
NOAA-EMC:feature/rm_nam_hiresw_sreffrom
MarcelCaron-NOAA:feature/regional/condense_mesoscale

Conversation

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor

Note to developers: You must use this PR template!

Description of Changes

Please include a summary of the changes and the related GitHub issue(s). Please also include relevant motivation and context.

This PR removes the mesoscale component from EVS and moves RAP verification into cam. Some details:

  • mesoscale prep was completely removed. Its only purpose was to generate SPC Outlook masks, which are already generated in cam.
  • mesoscale plots jobs included logic for time-shifted RAP verification, which did not previously exist in cam. This capability was added to cam in this PR.
  • the time-shifting logic in mesoscale conflicted with the ability to equalize forecasts used in verification. This PR includes a fix that allows both time-shifting and sample equalization (and therefore, sample sizes on plots).
  • RAP is typically displayed in red, but since red is already used for HRRR and other colors are reserved for the REFS members, RAP now uses the next available color: purple.
  • RAP ASNOW verification uses a different FCST_LEV setting than cam. This value was adjusted to match cam to allow plotting alongside the cam models.

Developer Questions and Checklist

  • Is this a high priority PR? If so, why and is there a date it needs to be merged by?

Yes; code freeze date for RRFS work is currently set for June 17.

  • Do you have any planned upcoming annual leave/PTO?

No

  • Are there any changes needed in the times when the jobs are supposed to run/kick-off?

No

  • The code changes follow NCO's EE2 Standards.
  • Developer's name is removed throughout the code and have used ${USER} where necessary throughout the code.
  • References the feature branch for HOMEevs are removed from the code.
  • J-Job environment variables, COMIN and COMOUT directories, and output follow what has been defined for EVS.
  • Jobs over 15 minutes in runtime have restart capability.
  • If applicable, changes in the dev/drivers/scripts or dev/modulefiles have been made in the corresponding ecf/scripts and ecf/defs/evs-nco.def?
  • Jobs contain the appropriate file checking and don't run METplus for any missing data.
  • Code is using METplus wrappers structure and not calling MET executables directly.
  • Log is free of any ERRORs or WARNINGs.

Testing Instructions

Please include testing instructions for the PR assignee. Include all relevant input datasets needed to run the tests.

1) Clone this branch for testing

  • cd into any testing location on WCOSS2-dev, e.g. /lfs/h2/emc/vpppg/noscrub/${USER}
  • Set up this EVS branch for testing:
git clone https://github.com/MarcelCaron-NOAA/EVS.git
cd EVS # You are now in HOMEevs
git checkout feature/regional/condense_mesoscale

2) Which jobs to test

  • Follow setup and testing instructions for the following jobs (called "${driver}" hereafter):
jevs_stats_cam_rap_grid2obs
jevs_stats_cam_rap_precip
jevs_stats_cam_rap_snowfall
jevs_plots_cam_grid2obs_last90days
jevs_plots_cam_precip_last90days
jevs_plots_cam_snowfall_last90days
jevs_plots_cam_headline

[Total: 3 stats jobs; 4 plots jobs]

3) Set up jobs

  • symlink the EVS_fix directory locally as "fix":
cd $HOMEevs
mkdir fix
ln -s /lfs/h2/emc/vpppg/noscrub/emc.vpppg/verification/EVS_fix/* fix/
  • In each driver script for the listed jobs (see the list of jobs above):
vi dev/drivers/scripts/${STEP}/cam/${driver}.sh 

... where ${STEP} is prep, stats, or plots, and ${driver} is the name of the job

  • Then edit or add the following environment variables:

For all drivers:
HOMEevs - set to your test EVS directory
COMOUT - set to your test output directory
KEEPDATA - (optional) set to "YES"
SENDMAIL - (optional) set to "NO"
DATAROOT - (optional) set to your test DATAROOT directory

For stats drivers:
COMIN - set to /lfs/h2/emc/vpppg/noscrub/emc.vpppg/$NET/$evs_ver_2d

For plots drivers:
COMIN - set to /lfs/h2/emc/vpppg/noscrub/marcel.caron/test/condense_mesoscale

4) Test jobs

  • cd into your test directory (where you want the log files to go)

  • Submit jevs_stats_cam_rap_grid2obs using qsub -v vhr=08 ${driver}.sh

  • Submit all other drivers using qsub -v vhr=00 ${driver}.sh

We can discuss whether or not we want to test anything else.

5) Check jobs

  • I will check the logs for the following keywords:
check="FATAL\|WARNING\|error\|Error\|Killed\|Cgroup\|argument expected\|No such file\|cannot\|failed\|unexpected\|exceeded"
grep "$check" $logfile
  • NOTE 1: Test RAP precip and snowfall stats files will appear in COMOUTsmall. grid2obs stats will be in COMOUTfinal.
  • NOTE 2: RAP ASNOW test statistics use different level information than cam, and won't show up on graphics. This issue is addressed in this PR, and we can check output snowfall stats to confirm that.
  • NOTE 3: I don't have an adequate RRFS Member stats directory to test with here, so plots jobs will include many warnings about missing rrfsmemX models, and graphics themselves will include only RRFS, HRRR, and RAP. If reviewers feels this is insufficient I can work on setting up some dummy data.

6) During testing ...

  • If I push changes or fixes during testing, you can usually update your branch like this:
git pull origin feature/regional/condense_mesoscale

Comment thread scripts/stats/cam/exevs_stats_cam_rap_snowfall.sh Outdated
Comment thread ush/cam/cam_rap_check_input_data.py Outdated
print(f"ERROR: Undefined data type for missing data file: {info[1]}"
+ f"\nPlease edit the get_data_type() function in"
+ f" USHevs/mesoscale/mesoscale_util.py")
+ f" USHevs/cam/mesoscale_util.py")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are removing all instances of "mesoscale", should this script be renamed "cam_util.py"? However, I realize there might already exist a script by that name so in that case, it makes sense to keep it "mesoscale_util.py" to avoid overwriting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShannonShields-NOAA That's a good question. I kept "mesoscale" in this instance for simplicity and to avoid accidentally mixing rap and cam utilities. mesoscale_util.py is used in a lot of RAP verification scripts. I could maybe change it to "rap_util.py"?

CC @AndrewBenjamin-NOAA

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcelCaron-NOAA I'll let @AndrewBenjamin-NOAA chime in about how we should address this. I think it depends on if we need to remove all instances of mesoscale to avoid any confusion with NCO...

Comment thread ush/cam/cam_rap_check_input_data.py Outdated
Comment thread ush/cam/cam_rap_stats_grid2obs_create_job_script.py Outdated
Comment thread ush/cam/cam_rap_stats_grid2obs_create_poe_job_scripts.py Outdated
Comment thread ush/cam/rap_util.py Outdated
@ShannonShields-NOAA
Copy link
Copy Markdown
Contributor

@MarcelCaron-NOAA For testing purposes, what VDATE should I use? Is the default ok or should I set a specific date?

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

@MarcelCaron-NOAA For testing purposes, what VDATE should I use? Is the default ok or should I set a specific date?

The defaults should be fine!

Comment thread ush/cam/rap_util.py Outdated
@ShannonShields-NOAA
Copy link
Copy Markdown
Contributor

@MarcelCaron-NOAA I have run the following stats jobs. Everything looks good to me with no ERRORs or WARNINGs and I see output stat files. Please confirm.

jevs_stats_cam_rap_grid2obs_00

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/stats/cam/jevs_stats_cam_rap_grid2obs_00.o120234053
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_stats_cam_rap_grid2obs_00.120234053.dbqs01
Stats: /lfs/h2/emc/vpppg/noscrub/shannon.shields/evs/v2.0/stats/cam/rap.20260527

jevs_stats_cam_rap_precip_00

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/stats/cam/jevs_stats_cam_rap_precip_00.o120234250
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_stats_cam_rap_precip_00.120234250.dbqs01
Stats: /lfs/h2/emc/vpppg/noscrub/shannon.shields/evs/v2.0/stats/cam/atmos.20260526/rap/precip

jevs_stats_cam_rap_snowfall_00

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/stats/cam/jevs_stats_cam_rap_snowfall_00.o120234493
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_stats_cam_rap_snowfall_00.120234493.dbqs01
Stats: /lfs/h2/emc/vpppg/noscrub/shannon.shields/evs/v2.0/stats/cam/atmos.20260526/rap/snowfall

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

@ShannonShields-NOAA Agreed, logs are clean and stats look normal compared to the parallel (some minor differences related to develop/rm_nam_hires_sref branch differences and not related to this PR).
jevs_stats_cam_rap_grid2obs
jevs_stats_cam_rap_precip
jevs_stats_cam_rap_snowfall

👍 stats test jobs look good, no concerns from me

@ShannonShields-NOAA
Copy link
Copy Markdown
Contributor

Excellent! Since my workday is ending soon, I will test the plot jobs tomorrow.
@MarcelCaron-NOAA

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

Sounds good!

@ShannonShields-NOAA
Copy link
Copy Markdown
Contributor

@MarcelCaron-NOAA Some of the plot jobs are done running; please check and confirm everything looks right. I saw the expected WARNINGs about the rrfsmem data:

WARNING: rrfsmem1 is not a model in /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_plots_cam_headline.120382659.dbqs01/headline/data/SL1L2_DPT2M_Alaska_LAST90DAYS/tmpe6adc18223064e349968610ac224c496.
05/29 12:53:50.322 (df_preprocessing.py:131) WARNING: It may be a group name, or else check if the stats_dir (/lfs/h2/emc/vpppg/noscrub/marcel.caron/test/condense_mesoscale/stats/cam) includes rrfsmem1 data according to the output_base template, given domain, variable, etc...

I can also confirm I saw the expected message for time-shifted RAP verification:
05/29 12:52:48.156 (lead_average.py:1606) DEBUG: A 'shift' query was included with the rap model. Consider instead setting 'delete_intermed_data' to True in settings.py, which allows mismatched comparisons between models that share no common lead hours.

jevs_plots_cam_headline

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/plots/cam/jevs_plots_cam_headline.o120382659
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_plots_cam_headline.120382659.dbqs01
Plots: /lfs/h2/emc/ptmp/shannon.shields/evs/v2.0/plots/cam/headline.20260528

jevs_plots_cam_snowfall_last90days

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/plots/cam/jevs_plots_cam_snowfall_last90days.o120382307
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_plots_cam_snowfall_last90days.120382307.dbqs01
Plots: /lfs/h2/emc/ptmp/shannon.shields/evs/v2.0/plots/cam/atmos.20260528

@ShannonShields-NOAA
Copy link
Copy Markdown
Contributor

@MarcelCaron-NOAA The precip plot job has finished (the last plot job of grid2obs will take the rest of the day it looks like).

jevs_plots_cam_precip_last90days

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/plots/cam/jevs_plots_cam_precip_last90days.o120382163
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_plots_cam_precip_last90days.120382163.dbqs01
Plots: /lfs/h2/emc/ptmp/shannon.shields/evs/v2.0/plots/cam/atmos.20260528

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

@ShannonShields-NOAA Thanks for running these. I can see I missed a few things

  • one is that the DEBUG line recommending that a setting 'delete_intermed_data' is used is an outdated note from mesoscale and unnecessary, so I went ahead an deleted that.
  • I also noticed a KeyError in the snowfall log, and pushed a fix for that. Do you mind at least rerunning the snowfall job to confirm that fix?

@ShannonShields-NOAA
Copy link
Copy Markdown
Contributor

@MarcelCaron-NOAA Yes, I will re-run the snowfall plot job. Should I re-run the precip job or stop and re-run the grid2obs job?

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

The precip plots job completed cleanly and output graphics look normal to me. Here's an example graphic with the RAP line included:
image

The graphic at this link shows what the RRFS Member lines will look like for comparison.

jevs_plots_cam_precip_last90days

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

@MarcelCaron-NOAA Yes, I will re-run the snowfall plot job. Should I re-run the precip job or stop and re-run the grid2obs job?

I think rerunning snowfall and maybe headline as well is sufficient to confirm the two changes! So no need in my opinion

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

(the last plot job of grid2obs will take the rest of the day it looks like).

Also, just a note that grid2obs plots took about 3 hours to finish in my tests, a lot less than its walltime. Not plotting the five RRFS Members reduces the job's runtime substantially.

@ShannonShields-NOAA
Copy link
Copy Markdown
Contributor

@MarcelCaron-NOAA Here is the output for the re-runs of headline and snowfall plots.

jevs_plots_cam_headline

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/plots/cam/jevs_plots_cam_headline.o120392147
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_plots_cam_headline.120392147.dbqs01
Plots: /lfs/h2/emc/ptmp/shannon.shields/evs/v2.0/plots/cam/headline.20260528

jevs_plots_cam_snowfall_last90days

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/plots/cam/jevs_plots_cam_snowfall_last90days.o120392164
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_plots_cam_snowfall_last90days.120392164.dbqs01
Plots: /lfs/h2/emc/ptmp/shannon.shields/evs/v2.0/plots/cam/atmos.20260528

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

Thanks @ShannonShields-NOAA! Both jobs completed cleanly and their output look normal. Sample graphics look normal and show RAP statistics as expected. Also as expected, no DEBUG statements in the log about the 'shift' query.
jevs_plots_cam_headline
jevs_plots_cam_snowfall_last90days

grid2obs is still running, but I expect it to complete by noon today.
jevs_plots_cam_grid2obs_last90days

@ShannonShields-NOAA
Copy link
Copy Markdown
Contributor

@MarcelCaron-NOAA The grid2obs plot job is still running, but I thought I'd check the log file so far and noticed the following KeyError:

Pruning /lfs/h2/emc/vpppg/noscrub/marcel.caron/test/condense_mesoscale/stats/cam files for model hrrr, vx_mask CONUS_West, variable  PTYPE , line_type MCTC, interp BILIN, interp points
END: prune_stat_files.py
Traceback (most recent call last):
  File "/apps/prod/ve/intel/19.1.3.304/python/3.10.4/evs/2.0/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'COUNTS'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/ush/cam/performance_diagram.py", line 1650, in <module>
    main()
  File "/lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/ush/cam/performance_diagram.py", line 1508, in main
    plot_performance_diagram(
  File "/lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/ush/cam/performance_diagram.py", line 439, in plot_performance_diagram
    stat_output = plot_util.calculate_stat(
  File "/lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/ush/cam/plot_util.py", line 1810, in calculate_stat
    counts = model_data.loc[:]['COUNTS']
  File "/apps/prod/ve/intel/19.1.3.304/python/3.10.4/evs/2.0/lib/python3.10/site-packages/pandas/core/frame.py", line 3807, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/apps/prod/ve/intel/19.1.3.304/python/3.10.4/evs/2.0/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'COUNTS'

It seems something might be wrong with the calculate_stat function for performance diagrams?

jevs_plots_cam_grid2obs_last90days

Log: /lfs/h2/emc/vpppg/noscrub/shannon.shields/pr970test/EVS/dev/drivers/scripts/plots/cam/jevs_plots_cam_grid2obs_last90days.o120381934
DATA: /lfs/h2/emc/stmp/shannon.shields/evs_test/prod/tmp/jevs_plots_cam_grid2obs_last90days.120381934.dbqs01

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

@ShannonShields-NOAA Thanks for catching this early. It looks like this is happening for the PTYPE graphic, and is related to an issue plotting MCTC statistics. I'll do some more testing and will push a fix before the end of the workday.

@MarcelCaron-NOAA
Copy link
Copy Markdown
Contributor Author

MarcelCaron-NOAA commented May 29, 2026

@ShannonShields-NOAA I made a fix and confirmed in my tests that the fix clears up the remaining errors and any warnings about sample equalization. Ready to continue testing after the weekend. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants