Segmentation fault with WRF-3.9.1.1 (Multithreaded mode)

Any issues with the actual running of the WRF.

Segmentation fault with WRF-3.9.1.1 (Multithreaded mode)

Postby puneet336 » Mon May 14, 2018 3:36 am

Hi all,
I am trying to run a simulation using WRF-3.9.1.1.
Issue - simulation crashes in multithreaded(openmp) mode, but simulation works fine in pure MPI mode.

We have 36 cores per node, so while running simulation on 10 nodes (360 cores)
Code: Select all
export OMP_NUM_THREADS=2
aprun -n 180 -N 18   /home/appuser/WRF/real/wrf.exe
where n = total processes ( same as -np/-n )
where N = total processes per node ( same as -ppn )

simulation fails immediately after generating 1 wrfout*.nc file.

Where as (with pure MPI setting) the following runs fine -
Code: Select all
export OMP_NUM_THREADS=1
aprun -n 360 -N 36  /home/appuser/WRF/real/wrf.exe



So i tried compiling the code in debug configuration . Fortunately , -traceback shows the stack call trace ,
Code: Select all
in grelldrv
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
wrf.exe            0000000003B4EFE4  Unknown               Unknown  Unknown
wrf.exe            00000000035709E0  Unknown               Unknown  Unknown
wrf.exe            0000000002BF877A  module_cu_g3_mp_c        1492  module_cu_g3.f90
wrf.exe            0000000002BE4CF8  module_cu_g3_mp_g         442  module_cu_g3.f90
wrf.exe            000000000248F70E  module_cumulus_dr        1079  module_cumulus_driver.f90
wrf.exe            00000000038D0DD3  Unknown               Unknown  Unknown
wrf.exe            0000000003889550  Unknown               Unknown  Unknown
wrf.exe            00000000038887D5  Unknown               Unknown  Unknown
wrf.exe            00000000038D1199  Unknown               Unknown  Unknown
wrf.exe            0000000003959E44  Unknown               Unknown  Unknown
wrf.exe            0000000003CA5CF9  Unknown               Unknown  Unknown


and i believe that the code crashes in - line#1492 in module_cu_g3.f90. If someone has faced this error , then please advice on the fix.

I have uploaded relevant files at - bitbucket repo . Please let me know if i can provide any further information.
FYI -
Code: Select all
$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 2062441
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 2062441
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
puneet336
 
Posts: 6
Joined: Thu Mar 22, 2018 2:02 am

Return to Runtime Problems

Who is online

Users browsing this forum: No registered users and 6 guests