detect_lostruns while running jobs on multiple clusters

When running:
lpad detect_lostruns --fizzle

``

On one cluster, and having multiple jobs running across multiple clusters (ie. some on Savio, some on Lawrencium). Anything that is running on a different cluster than where the command was executed gets marked as FIZZLED even if the job is still running on the other cluster.

Is there a way to only execute that command on fireworks running on the cluster the command is run or, could this be fixed by implementing the “host” field from the firework’s launch document in the future?

Thanks!

Martin

Hi Martin

I just pushed a change so that detect_lostruns takes in an optional argument called “launch_query”. Here you can restrict any part of the launch document.

You should be able to set launch_query="{‘host’:‘my_host’}" or launch_query="{‘host’:{’$regex’: ‘my_host*’}}"

This should be pushed in the next release of FWS (v1.9.3). I haven’t had a chance to test it so let me know if it works for you.

···

On Wednesday, June 12, 2019 at 10:17:33 AM UTC-7, Martin Siron wrote:

When running:
lpad detect_lostruns --fizzle

``

On one cluster, and having multiple jobs running across multiple clusters (ie. some on Savio, some on Lawrencium). Anything that is running on a different cluster than where the command was executed gets marked as FIZZLED even if the job is still running on the other cluster.

Is there a way to only execute that command on fireworks running on the cluster the command is run or, could this be fixed by implementing the “host” field from the firework’s launch document in the future?

Thanks!

Martin

Thanks, that worked!

···

On Monday, June 24, 2019 at 4:54:11 PM UTC-7, Anubhav Jain wrote:

Hi Martin

I just pushed a change so that detect_lostruns takes in an optional argument called “launch_query”. Here you can restrict any part of the launch document.

You should be able to set launch_query="{‘host’:‘my_host’}" or launch_query="{‘host’:{’$regex’: ‘my_host*’}}"

This should be pushed in the next release of FWS (v1.9.3). I haven’t had a chance to test it so let me know if it works for you.

On Wednesday, June 12, 2019 at 10:17:33 AM UTC-7, Martin Siron wrote:

When running:
lpad detect_lostruns --fizzle

``

On one cluster, and having multiple jobs running across multiple clusters (ie. some on Savio, some on Lawrencium). Anything that is running on a different cluster than where the command was executed gets marked as FIZZLED even if the job is still running on the other cluster.

Is there a way to only execute that command on fireworks running on the cluster the command is run or, could this be fixed by implementing the “host” field from the firework’s launch document in the future?

Thanks!

Martin