Error in running atomate in offline Mode?

Dear all,

I would like to run atomate in offline mode on a cluster with LSF job managing system. I have attached all the relevant files of running a test job in a zip file.
The error I got in “Si-structure_optimiz-41419.error” is :

/home1/xiaoh/miniconda3/lib/python3.7/site-packages/pymatgen/io/cif.py:44: UserWarning: Please install optional dependency pybtex if youwant to extract references from CIF files.

warnings.warn(“Please install optional dependency pybtex if you”

Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 320, in run

self._run_job(job_n, job)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 428, in _run_job

raise CustodianError(s, True, v)

custodian.custodian.CustodianError: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/fireworks/core/rocket.py”, line 262, in run

m_action = t.run_task(my_spec)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/atomate/vasp/firetasks/run_calc.py”, line 204, in run_task

c.run()

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 330, in run

.format(self.total_errors, ex))

RuntimeError: 0 errors reached: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’). Exited…

The error I got in “std_err.txt” is:

[mpiexec@n0114] HYDU_parse_hostfile (…/…/utils/args/args.c:535): unable to open host file: $CURDIR/nodelist.41419

[mpiexec@n0114] machine_file_fn (…/…/ui/mpich/utils.c:509): error parsing machine file

[mpiexec@n0114] match_arg (…/…/utils/args/args.c:243): match handler returned error

[mpiexec@n0114] HYDU_parse_array (…/…/utils/args/args.c:269): argument matching returned error

[mpiexec@n0114] parse_args (…/…/ui/mpich/utils.c:4009): error parsing input array

[mpiexec@n0114] HYD_uii_mpx_get_parameters (…/…/ui/mpich/utils.c:4339): unable to parse user arguments

My impression is that atomate didn’t run vasp properly with lsf system. Note that I have modified the LoadSharingFacility_template.txt file, which is included in the zip file attached.

I also included my_fworker.yaml in the offline folder in the zip file attached. I am kind of confused how to define vasp_cmd in my_fworker.yaml.

I attached the lsf script I often used to submit vasp jobs in this cluster as reference.

If I define vasp_cmd as “mpirun -np 16 vasp”, the jobs will not be conducted in the computing nodes. If I define vasp_cmd as “mpirun -np $NPROCS -machinefile $CURDIR/nodelist.$LSB_JOBID /home1/xiaoh/vasp/vasp.5.3-vtst/vasp > log”, then the variables NPROCS, CURDIR and LSB_JOBID are not defined, which leads to the error in std_err.txt shown above.

Could you guys guide me to run atomate in offline mode on a cluster with LSF job managing system?

Thank you very much!

Hang Xiao

Postdoc

Columbia Univeristy

launcher_2019-02-05-19-53-01-178809.zip (129 KB)

vaspvtstTail.lsf (1.76 KB)

Hi Hang

I agree the immediate problem is likely your vasp_cmd (from your std_error.txt as well as check vasp.out). I am not sure how to run on your specific computing center and ensure that these environment variables are available on your compute node.

To get this working in the short term, my next step would likely to do some print statements after this code:

atomate/vasp/firetasks/run_calc.py:116

if isinstance(vasp_cmd, six.string_types):
    vasp_cmd = os.path.expandvars(vasp_cmd)
    vasp_cmd = shlex.split(vasp_cmd)


To see what vasp_cmd is actually being passed in to custodian. It's possible you will want to expand the environment variables prior to putting them in custodian.



However, a more pressing concern is what happens even after you get this running. Atomate is not really designed well for offline mode, and you will have problems with database insertion of results immediately after the VASP job finishes. This is discussed in some of the previous messages on this help list, see for example:

https://groups.google.com/forum/#!searchin/atomate/offline%7Csort:date/atomate/dggbBsK628Q/2653lXFGAwAJ


···

On Tuesday, February 5, 2019 at 12:40:24 PM UTC-8, Hang Xiao wrote:

Dear all,

I would like to run atomate in offline mode on a cluster with LSF job managing system. I have attached all the relevant files of running a test job in a zip file.
The error I got in “Si-structure_optimiz-41419.error” is :

/home1/xiaoh/miniconda3/lib/python3.7/site-packages/pymatgen/io/cif.py:44: UserWarning: Please install optional dependency pybtex if youwant to extract references from CIF files.

warnings.warn(“Please install optional dependency pybtex if you”

Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 320, in run

self._run_job(job_n, job)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 428, in _run_job

raise CustodianError(s, True, v)

custodian.custodian.CustodianError: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/fireworks/core/rocket.py”, line 262, in run

m_action = t.run_task(my_spec)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/atomate/vasp/firetasks/run_calc.py”, line 204, in run_task

c.run()

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 330, in run

.format(self.total_errors, ex))

RuntimeError: 0 errors reached: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’). Exited…

The error I got in “std_err.txt” is:

[mpiexec@n0114] HYDU_parse_hostfile (…/…/utils/args/args.c:535): unable to open host file: $CURDIR/nodelist.41419

[mpiexec@n0114] machine_file_fn (…/…/ui/mpich/utils.c:509): error parsing machine file

[mpiexec@n0114] match_arg (…/…/utils/args/args.c:243): match handler returned error

[mpiexec@n0114] HYDU_parse_array (…/…/utils/args/args.c:269): argument matching returned error

[mpiexec@n0114] parse_args (…/…/ui/mpich/utils.c:4009): error parsing input array

[mpiexec@n0114] HYD_uii_mpx_get_parameters (…/…/ui/mpich/utils.c:4339): unable to parse user arguments

My impression is that atomate didn’t run vasp properly with lsf system. Note that I have modified the LoadSharingFacility_template.txt file, which is included in the zip file attached.

I also included my_fworker.yaml in the offline folder in the zip file attached. I am kind of confused how to define vasp_cmd in my_fworker.yaml.

I attached the lsf script I often used to submit vasp jobs in this cluster as reference.

If I define vasp_cmd as “mpirun -np 16 vasp”, the jobs will not be conducted in the computing nodes. If I define vasp_cmd as “mpirun -np $NPROCS -machinefile $CURDIR/nodelist.$LSB_JOBID /home1/xiaoh/vasp/vasp.5.3-vtst/vasp > log”, then the variables NPROCS, CURDIR and LSB_JOBID are not defined, which leads to the error in std_err.txt shown above.

Could you guys guide me to run atomate in offline mode on a cluster with LSF job managing system?

Thank you very much!

Hang Xiao

Postdoc

Columbia Univeristy

Hi Anubhav,

Thank you for your help! It seems that the best way to go is to get a dedicated IP address for the cluster and use Mongodb Atlas to avoid the problems in running atomate offline.

Best,

Hang Xiao

···

On Friday, February 8, 2019 at 2:48:45 PM UTC-5, Anubhav Jain wrote:

Hi Hang

I agree the immediate problem is likely your vasp_cmd (from your std_error.txt as well as check vasp.out). I am not sure how to run on your specific computing center and ensure that these environment variables are available on your compute node.

To get this working in the short term, my next step would likely to do some print statements after this code:

atomate/vasp/firetasks/run_calc.py:116

if isinstance(vasp_cmd, six.string_types):
    vasp_cmd = os.path.expandvars(vasp_cmd)
    vasp_cmd = shlex.split(vasp_cmd)



To see what vasp_cmd is actually being passed in to custodian. It's possible you will want to expand the environment variables prior to putting them in custodian.





However, a more pressing concern is what happens even after you get this running. Atomate is not really designed well for offline mode, and you will have problems with database insertion of results immediately after the VASP job finishes. This is discussed in some of the previous messages on this help list, see for example:


[https://groups.google.com/forum/#!searchin/atomate/offline%7Csort:date/atomate/dggbBsK628Q/2653lXFGAwAJ](https://groups.google.com/forum/#!searchin/atomate/offline%7Csort:date/atomate/dggbBsK628Q/2653lXFGAwAJ)




On Tuesday, February 5, 2019 at 12:40:24 PM UTC-8, Hang Xiao wrote:

Dear all,

I would like to run atomate in offline mode on a cluster with LSF job managing system. I have attached all the relevant files of running a test job in a zip file.
The error I got in “Si-structure_optimiz-41419.error” is :

/home1/xiaoh/miniconda3/lib/python3.7/site-packages/pymatgen/io/cif.py:44: UserWarning: Please install optional dependency pybtex if youwant to extract references from CIF files.

warnings.warn(“Please install optional dependency pybtex if you”

Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 320, in run

self._run_job(job_n, job)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 428, in _run_job

raise CustodianError(s, True, v)

custodian.custodian.CustodianError: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/fireworks/core/rocket.py”, line 262, in run

m_action = t.run_task(my_spec)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/atomate/vasp/firetasks/run_calc.py”, line 204, in run_task

c.run()

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 330, in run

.format(self.total_errors, ex))

RuntimeError: 0 errors reached: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’). Exited…

The error I got in “std_err.txt” is:

[mpiexec@n0114] HYDU_parse_hostfile (…/…/utils/args/args.c:535): unable to open host file: $CURDIR/nodelist.41419

[mpiexec@n0114] machine_file_fn (…/…/ui/mpich/utils.c:509): error parsing machine file

[mpiexec@n0114] match_arg (…/…/utils/args/args.c:243): match handler returned error

[mpiexec@n0114] HYDU_parse_array (…/…/utils/args/args.c:269): argument matching returned error

[mpiexec@n0114] parse_args (…/…/ui/mpich/utils.c:4009): error parsing input array

[mpiexec@n0114] HYD_uii_mpx_get_parameters (…/…/ui/mpich/utils.c:4339): unable to parse user arguments

My impression is that atomate didn’t run vasp properly with lsf system. Note that I have modified the LoadSharingFacility_template.txt file, which is included in the zip file attached.

I also included my_fworker.yaml in the offline folder in the zip file attached. I am kind of confused how to define vasp_cmd in my_fworker.yaml.

I attached the lsf script I often used to submit vasp jobs in this cluster as reference.

If I define vasp_cmd as “mpirun -np 16 vasp”, the jobs will not be conducted in the computing nodes. If I define vasp_cmd as “mpirun -np $NPROCS -machinefile $CURDIR/nodelist.$LSB_JOBID /home1/xiaoh/vasp/vasp.5.3-vtst/vasp > log”, then the variables NPROCS, CURDIR and LSB_JOBID are not defined, which leads to the error in std_err.txt shown above.

Could you guys guide me to run atomate in offline mode on a cluster with LSF job managing system?

Thank you very much!

Hang Xiao

Postdoc

Columbia Univeristy

Yes, you just need to make sure that your computing center is able to connect to the database (e.g. hosted on Atlas) and that there are no firewall restrictions

···

On Friday, February 8, 2019 at 11:54:15 AM UTC-8, Hang Xiao wrote:

Hi Anubhav,

Thank you for your help! It seems that the best way to go is to get a dedicated IP address for the cluster and use Mongodb Atlas to avoid the problems in running atomate offline.

Best,

Hang Xiao

On Friday, February 8, 2019 at 2:48:45 PM UTC-5, Anubhav Jain wrote:

Hi Hang

I agree the immediate problem is likely your vasp_cmd (from your std_error.txt as well as check vasp.out). I am not sure how to run on your specific computing center and ensure that these environment variables are available on your compute node.

To get this working in the short term, my next step would likely to do some print statements after this code:

atomate/vasp/firetasks/run_calc.py:116

if isinstance(vasp_cmd, six.string_types):
    vasp_cmd = os.path.expandvars(vasp_cmd)
    vasp_cmd = shlex.split(vasp_cmd)



To see what vasp_cmd is actually being passed in to custodian. It's possible you will want to expand the environment variables prior to putting them in custodian.





However, a more pressing concern is what happens even after you get this running. Atomate is not really designed well for offline mode, and you will have problems with database insertion of results immediately after the VASP job finishes. This is discussed in some of the previous messages on this help list, see for example:


[https://groups.google.com/forum/#!searchin/atomate/offline%7Csort:date/atomate/dggbBsK628Q/2653lXFGAwAJ](https://groups.google.com/forum/#!searchin/atomate/offline%7Csort:date/atomate/dggbBsK628Q/2653lXFGAwAJ)




On Tuesday, February 5, 2019 at 12:40:24 PM UTC-8, Hang Xiao wrote:

Dear all,

I would like to run atomate in offline mode on a cluster with LSF job managing system. I have attached all the relevant files of running a test job in a zip file.
The error I got in “Si-structure_optimiz-41419.error” is :

/home1/xiaoh/miniconda3/lib/python3.7/site-packages/pymatgen/io/cif.py:44: UserWarning: Please install optional dependency pybtex if youwant to extract references from CIF files.

warnings.warn(“Please install optional dependency pybtex if you”

Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 320, in run

self._run_job(job_n, job)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 428, in _run_job

raise CustodianError(s, True, v)

custodian.custodian.CustodianError: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/fireworks/core/rocket.py”, line 262, in run

m_action = t.run_task(my_spec)

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/atomate/vasp/firetasks/run_calc.py”, line 204, in run_task

c.run()

File “/home1/xiaoh/miniconda3/lib/python3.7/site-packages/custodian/custodian.py”, line 330, in run

.format(self.total_errors, ex))

RuntimeError: 0 errors reached: (CustodianError(…), ‘Validation failed: <custodian.vasp.validators.VasprunXMLValidator object at 0x2b94f8d449b0>’). Exited…

The error I got in “std_err.txt” is:

[mpiexec@n0114] HYDU_parse_hostfile (…/…/utils/args/args.c:535): unable to open host file: $CURDIR/nodelist.41419

[mpiexec@n0114] machine_file_fn (…/…/ui/mpich/utils.c:509): error parsing machine file

[mpiexec@n0114] match_arg (…/…/utils/args/args.c:243): match handler returned error

[mpiexec@n0114] HYDU_parse_array (…/…/utils/args/args.c:269): argument matching returned error

[mpiexec@n0114] parse_args (…/…/ui/mpich/utils.c:4009): error parsing input array

[mpiexec@n0114] HYD_uii_mpx_get_parameters (…/…/ui/mpich/utils.c:4339): unable to parse user arguments

My impression is that atomate didn’t run vasp properly with lsf system. Note that I have modified the LoadSharingFacility_template.txt file, which is included in the zip file attached.

I also included my_fworker.yaml in the offline folder in the zip file attached. I am kind of confused how to define vasp_cmd in my_fworker.yaml.

I attached the lsf script I often used to submit vasp jobs in this cluster as reference.

If I define vasp_cmd as “mpirun -np 16 vasp”, the jobs will not be conducted in the computing nodes. If I define vasp_cmd as “mpirun -np $NPROCS -machinefile $CURDIR/nodelist.$LSB_JOBID /home1/xiaoh/vasp/vasp.5.3-vtst/vasp > log”, then the variables NPROCS, CURDIR and LSB_JOBID are not defined, which leads to the error in std_err.txt shown above.

Could you guys guide me to run atomate in offline mode on a cluster with LSF job managing system?

Thank you very much!

Hang Xiao

Postdoc

Columbia Univeristy