Tempo version 3#353
Conversation
|
@AndersJensen-NOAA Could you please fill out the pull request template so that this PR can be reviewed? |
|
@grantfirl I made a few changes and updated to tempo v3.0.6. On ursa, when I run my regression test, sometimes it fails with oom, but it seems to depend on the node. If I resubmit the job_card it will pass. |
Are you going to merge AndersJensen-NOAA#6? |
The changes that you added were already fixed in AndersJensen-NOAA#6. Plus, it updates to the latest ufs/dev, which needs to get done anyway (and there were many manual merge conflicts). It also fixes the physical constant problem identified in this PR review. |
|
Regarding the OOM failures, EPIC code managers have requested that this gets fixed because even if the failures are intermittent, it still messes with code management practices. I think on some GitHub comment you said that TEMPOv3 would fix the OOM failures, but I'm seeing more, not less of these with v3 on Ursa. |
|
@grantfirl I merged your changes in. I might have messed up the part where you deleted to old TEMPO scheme, so I will go back an fix that. |
|
also @grantfirl some files were deleted in your PR so do I need a new ccpp config? |
I think that the only thing that was deleted was the original TEMPO submodule, so unless the ccpp_prebuild_config file was referencing something in the old TEMPO submodule instead of TEMPO_v3, I don't think that should be the case. Plus, I didn't run into any issues during the ccpp_prebuild phase of building, so I think that we should be good. |
|
@grantfirl: I have everything updated now from ccpp, ufsatm and UFS weather model. Can you take a look and confirm and see if your test works? |
ignore this, I hadn't updated ufsatm which contained the updated ccpp prebuild needed with your ccpp updates. |
| branch = main | ||
| [submodule "physics/MP/TEMPO/tempo_v3"] | ||
| path = physics/MP/TEMPO/tempo_v3 | ||
| url = https://github.com/grantfirl/TEMPO.git |
There was a problem hiding this comment.
This should be switched to:
url = https://github.com/NCAR/TEMPO
branch = tempo_3.1.0
since the commit hash of TEMPO that you're pointing to is in that branch, right?
There was a problem hiding this comment.
Yes, but ultimately 3.1.0 will be merged into main.
There was a problem hiding this comment.
OK, but for purposes of this PR chain, will 3.1.0 be merged into main before the chain is merged? Would it make sense to have a PR in TEMPO that merges the tempo_3.1.0 branch into main that is then part of this PR chain?
I tried to run the existing TEMPO tests and 2 out of 3 ran to completion with the other (control_p8_ugwpv1_tempo_aerosol_intel) experiencing an OOM error. The run_dir of this test is: /scratch3/BMC/gmtb/Grant.Firl/stmp2/Grant.Firl/FV3_RT/rt_863677 Edit: I checked out the branches manually. Otherwise, there would have been a problem with the ccpp-physics repo pointing to the wrong branch of TEMPO, as discussed upthread. |
|
So, the OOM errors do seem to still be intermittent, but it would certainly be good to have a test that solidly completes every time that is closer to how TEMPO is intended to be used. |
@grantfirl TEMPO has larger lookup tables than Thompson, so a bit more memory seems to be needed. On ursa, I'm having good luck with runs by adding this to the job_card: |
|
@grantfirl I just pushed changes to the tempo tests that should fix the OOM issue. |
|
@grantfirl @dustinswales |
@AndersJensen-NOAA I can take a look. Are you sure that they completed? I don't see the /scratch4/BMC/wrfruc/jensen/ufs_tempo_dev_test/tests/logs/RegressionTests_ursa.log file only a backup from yesterday. |
@grantfirl If they did not complete, then I don't know why they didn't. How do I debug that? |
When you ran rt.sh, did you save a log of the output? I'm seeing compilation failures in TEMPO. I see a bunch of |
|
It looks like the TEMPO tests (control_p8_ugwpv1_tempo_aerosol_intel and control_p8_ugwpv1_tempo_intel, regional_wofs_tempo_intel) completed successfully. They failed due to the result change, which is expected. control_wam_debug_gnu failed due to a time-out, which happens occasionally and usually isn't our fault. cpld_debug_sfs_intel, cpld_debug_sfs_intel, cpld_debug_sfs_intelllvm failed due to OOM. It looks like the compilation failures are related to real type errors. It looks like all of the compilation failures are with tests that are varying the default real types. So, I would just doublecheck any recent changes in TEMPO related to real types. |
@grantfirl Thanks! I'll compile the cpld_debug tests and see if I can find the issue. |
|
I need help getting these two regression tests to pass: I modified GFS_rrtmpg_pre for the TEMPO hookup, and now the two mpas tests fail. It appears that those tests are radiation only physics tests specific to MPAS. Since we aren't using MPAS yet in the UFS and since these aren't actually full physics tests, I think those tests should actually be turned off. If not, can one of you fix the TEMPO hookup on the CCPP side? Or better yet, address #361 Thanks. |
We use those tests to keep the MPAS-in-UFS functionality working, so they need to stay. It looks like your RTs were run with the wrong (old) version of TEMPO checked out. I addressed the comment in #353 (comment). Please try to pull down the latest commit of this PR branch with the .gitmodules fix and run Once that's done, please run those 2 failing mpas-based tests again. I'm guessing that it'll fix them. |
Still failed: |
|
Any other ideas, @grantfirl |
Description of Changes:
Tempo version 3. See release notes and documentation in the TEMPO: repository https://github.com/NCAR/TEMPO
Tests Conducted:
Tempo Regression tests
Dependencies:
NOAA-EMC/ufsatm#1063
ufs-community/ufs-weather-model#3078
Documentation:
https://ncar.github.io/TEMPO/
Issue (optional):
Contributors (optional):