Hello all,
I’m pretty stumped by this error. Jobs are running perfectly fine for the first 5-6 hours, but then stop abruptly. I know this isn’t a VASP issue because it runs fine when submitting it directly to the SLURM scheduler. The below error from FW_error output seems to indicate that custodian believes the wall time has been reached, but the submit script created by atomate shows that time should be 48 hours, nowhere near the 5 hours that the job ran for. I considered that the problem might be the “RuntimeWarning: divide by zero…” errors that come after the wall time error, but I find no documentation of this elsewhere, and have no idea why these errors would arise many hours after the job began running. My submission script seems to be perfectly in order.
Note: I’ve included my std_err file to show what it says just in case, but don’t believe that the “huge pages error” has anything to do with this, as I see this error show up with the jobs I manually submit to slurp and it doesn’t affect the job.
Does anyone have any ideas with what this problem could be?
For reference, running atomate on NERSC for the first time after having run it on other shared resources before.
Best,
Nick Winner
FW_job.error
/global/homes/n/nwinner/.conda/envs/matsci/lib/python3.6/site-packages/pymatgen/io/cif.py:37: UserWarning: Please install optional dependency pybtex if youwant to extract references from CIF files.
warnings.warn(“Please install optional dependency pybtex if you”
{ ‘actions’: None,
‘errors’: [ 'Walltime ’
'reached'],
‘handler’: <custodian.vasp.handlers.WalltimeHandler object at 0x2aaacaa29208>}
vasp_std: no process found
Unrecoverable error for handler: <custodian.vasp.handlers.WalltimeHandler object at 0x2aaacaa29208>
/global/homes/n/nwinner/.conda/envs/matsci/lib/python3.6/site-packages/pymatgen/core/lattice.py:1094: RuntimeWarning: divide by zero encountered in sqrt
return np.sqrt(d2)
/global/homes/n/nwinner/.conda/envs/matsci/lib/python3.6/site-packages/pymatgen/core/lattice.py:1094: RuntimeWarning: invalid value encountered in sqrt
return np.sqrt(d2)
FW_submit.script
#!/bin/bash -l
#SBATCH --nodes=4
#SBATCH --time=48:00:00
#SBATCH --partition=regular
#SBATCH --account=m1090
#SBATCH --job-name=FW_job
#SBATCH --output=FW_job-%j.out
#SBATCH --error=FW_job-%j.error
#SBATCH --constraint=knl
module load python/3.6-anaconda-4.4
source activate matsci
module load vasp/20170629-knl
export OMP_PROC_BIND=true
export OMP_PLACES=threads
export OMP_NUM_THREADS=4
cd /global/cscratch1/sd/nwinner/flibe/solutes/Cr/0
rlaunch -c /global/u1/n/nwinner/config/atomate rapidfire
CommonAdapter (SLURM) completed writing Template
std_err.txt
···
libhugetlbfs [nid07183:185095]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07182:184826]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07188:183785]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07183:185096]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07188:183786]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07183:185097]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07182:184827]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07188:183787]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid06827:183854]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07183:185098]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07182:184828]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07188:183788]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid07182:184829]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid06827:183855]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid06827:183856]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
libhugetlbfs [nid06827:183853]: WARNING: Maximum number of huge page sizes exceeded, ignoring 8388608kB page size
PROFILE, used timers: 236