Postby MeteoAdriatic » Sun Feb 25, 2018 12:35 pm


I have one strange problem on one installation. I experienced this very rarely before on other projects but on this one it is very often.

Randomly, for example once among several successful runs, wrf.exe just stops integrating through time. All mpi processes continue to work utilizing cpu 100%, but there is indefinitelly no progress. rsl.out/error logs just stop at random point in that case, and there is no error message or anything that is different from running normally.

Of course, such run never finish. I have to kill it...

Does anybody had experienced such behaviour?

Thank you
Re: wrf.exe stops integrating randomly

Postby dcvz » Tue Mar 13, 2018 3:35 am

It sounds like the run hit a NaN. This happens on certain architectures (used to happen on old IBMs). It could be a bug in the compiler (upgrade whatever you're using or try a different compiler). It could be an optimization problem (back off the optimization level). If the problem continues, you might have to isolate it to whatever physics or dynamics is causing it. Then, change to a different physics or dynamics option. NaNs can be seen in wrfout files using ncview if the history output time is close enough to the failure time.

The NaN problem is reproducible. The same case should fail in the same way every time you run it. If it doesn't, then it might be a hardware problem.
