Should add terminate function to fireworks?

Some job scheduling system (like slurm) needs a special terminate function to work with custodian. For the reasons in atomate pull#68 and custodian pull#47, I think maybe the terminate function should add to FireWorks? and Where?

We had discussed this a bit in atomate pull#68 like you linked to -

As far as I can tell, this terminate function is really for stuck VASP jobs and would be initiated by the custodian (rather than FireWorks). If I understand correctly, you want to handle a VASP error both by quitting VASP and also explicitly killing the queue job that it is a part of. If that is the case, I would say to just leave it as custom code. i.e.,

  • it is a “VASP error handler” procedure so that makes it appropriate for custodian, BUT

  • it involves queue interactions, which custodian apparently does not want to deal with as per your pull request to custodian.

I don’t really see a need for this in FireWorks, since it is really interacting with custodian (which FireWorks is not assuming) and seems tailored to stuck VASP jobs (again, which FireWorks is not tailored for). I didn’t see a proposal for a more general function that would apply to all workflows (i.e. not just VASP) that would make it appropriate for FireWorks.

So I would suggest having your own branch of custodian with your own custom custodian error handler that has this function. That custodian error handler has the code that will terminate the SLURM job.

Unfortunately, this looks to be one case where we can’t find enough general interest to pull it into the common repos. Thanks for taking the time to try to contribute it back.

Anubhav

···

On Thursday, March 16, 2017 at 8:26:47 AM UTC-7, specter119 wrote:

Some job scheduling system (like slurm) needs a special terminate function to work with custodian. For the reasons in atomate pull#68 and custodian pull#47, I think maybe the terminate function should add to FireWorks? and Where?