I am trying to run a simulation using WRF-3.9.1.1.
Issue - simulation crashes in multithreaded(openmp) mode, but simulation works fine in pure MPI mode.
We have 36 cores per node, so while running simulation on 10 nodes (360 cores)
- Code: Select all
export OMP_NUM_THREADS=2
aprun -n 180 -N 18 /home/appuser/WRF/real/wrf.exe
where n = total processes ( same as -np/-n )
where N = total processes per node ( same as -ppn )
simulation fails immediately after generating 1 wrfout*.nc file.
Where as (with pure MPI setting) the following runs fine -
- Code: Select all
export OMP_NUM_THREADS=1
aprun -n 360 -N 36 /home/appuser/WRF/real/wrf.exe
So i tried compiling the code in debug configuration . Fortunately , -traceback shows the stack call trace ,
- Code: Select all
in grelldrv
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
wrf.exe 0000000003B4EFE4 Unknown Unknown Unknown
wrf.exe 00000000035709E0 Unknown Unknown Unknown
wrf.exe 0000000002BF877A module_cu_g3_mp_c 1492 module_cu_g3.f90
wrf.exe 0000000002BE4CF8 module_cu_g3_mp_g 442 module_cu_g3.f90
wrf.exe 000000000248F70E module_cumulus_dr 1079 module_cumulus_driver.f90
wrf.exe 00000000038D0DD3 Unknown Unknown Unknown
wrf.exe 0000000003889550 Unknown Unknown Unknown
wrf.exe 00000000038887D5 Unknown Unknown Unknown
wrf.exe 00000000038D1199 Unknown Unknown Unknown
wrf.exe 0000000003959E44 Unknown Unknown Unknown
wrf.exe 0000000003CA5CF9 Unknown Unknown Unknown
and i believe that the code crashes in - line#1492 in module_cu_g3.f90. If someone has faced this error , then please advice on the fix.
I have uploaded relevant files at - bitbucket repo . Please let me know if i can provide any further information.
FYI -
- Code: Select all
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2062441
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 2062441
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited