queue installation clarification?

Hi,
Could someone clarify for me how I should install Fireworks and MongoDB in the following situation? I have a standalone (that is, non-cluster) server that I’d like to use as my launch pad. This machine will also run some of the jobs in my workflow. I also have a large cluster which will run the other jobs in the workflow. The cluster consists of a login node and many compute nodes, and jobs are submitted from the login node to the compute nodes using slurm.
I’ve installed Fireworks on the standalone machine and run some of the tutorials, and that all works great. I’m not clear, however, on whether I need to install Fireworks and MongoDB on the login node of the cluster. Is one of these correct…

  1. qlaunch on the standalone machine will talk directly to slurm on the cluster login node (via ssh?), so Fireworks does not need to be installed on the cluster login node.
  2. qlaunch on the standalone machine will talk to Fireworks on the cluster login node, so Fireworks does need to be installed on the login node.

And in the second case do I also need MongoDB on the login node?

Any guidance would be appreciated.

Matthew

Hi Matthew,

You only need one machine to install MongoDB, in this case your standalone machine. This machine does not necessarily need to have FireWorks installed.

You need to install FireWorks anywhere that you run workflows. In your case, it is the standalone machine and the login node of your cluster since both will be running workflows. Basically, in order to run workflows you need to have the “rlaunch” command that pulls jobs and executes them.

You can then operate things via mechanism #2 above (but not mechanism #1). For more details, you should look at the “remote qlaunch” part of the docs:

https://pythonhosted.org/FireWorks/queue_tutorial.html

Hope that helps. Feel free to ask again if it’s giving you issues. I would suggest to try getting things working with normal qlaunch before moving on to remote qlaunch.

Best

Anubhav

···

On Thu, Nov 12, 2015 at 1:59 PM, Matthew Cahn [email protected] wrote:

Hi,
Could someone clarify for me how I should install Fireworks and MongoDB in the following situation? I have a standalone (that is, non-cluster) server that I’d like to use as my launch pad. This machine will also run some of the jobs in my workflow. I also have a large cluster which will run the other jobs in the workflow. The cluster consists of a login node and many compute nodes, and jobs are submitted from the login node to the compute nodes using slurm.
I’ve installed Fireworks on the standalone machine and run some of the tutorials, and that all works great. I’m not clear, however, on whether I need to install Fireworks and MongoDB on the login node of the cluster. Is one of these correct…

  1. qlaunch on the standalone machine will talk directly to slurm on the cluster login node (via ssh?), so Fireworks does not need to be installed on the cluster login node.
  2. qlaunch on the standalone machine will talk to Fireworks on the cluster login node, so Fireworks does need to be installed on the login node.

And in the second case do I also need MongoDB on the login node?

Any guidance would be appreciated.

Matthew

You received this message because you are subscribed to the Google Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/4e2d8bcb-3f2f-4f54-ae4a-361e725fb53f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Thanks Anubhav, this was very helpful. I now have MongoDB installed on the standalone machine, and Fireworks on the both the standalone machine and the cluster login node. I’m able to run jobs on the standalone machine, and on the cluster by using qlaunch on the login node in reservation/offline mode.

Now I’d like to ask if you could clarify what the --remote_host argument to qlaunch does. It seems like this should make it possible to run qlaunch on the standalone machine and have it execute jobs on the cluster. Using that I could do all the work of adding jobs to the workflow, and launching them, from the standalone machine. Is that correct?

I ran into a couple of issues trying to use --remote_host (-rh). On the standalone machine I ran this (“tiger” is the cluster login node):

qlaunch -rh tiger -rc -ru mcahn -r singleshot

First, I find that only rapidfire will work. singleshot complains that there are extra arguments to qlaunch. Something seems to add some arguments that are specific to rapidfire:

[tiger] run: qlaunch --reserve singleshot --maxjobs_queue None --maxjobs_block None --nlaunches None
[tiger] out: qlaunch: error: unrecognized arguments: --maxjobs_queue None --maxjobs_block None --nlaunches None

With rapidfire I get a complaint from queue_adapter.py. (In this example I’ve modified queue_adapter.py to print self.command and
kwargs). It seems to be complaining about the -1 (which is subprocess.PIPE) in kwargs:

qlaunch -rh tiger -rc -ru mcahn -r rapidfire

[tiger] out: 2015-11-16 15:43:49,699 INFO getting queue adapter
[tiger] out: 2015-11-16 15:43:49,701 INFO Found previous block, using /path/to/queue_tests/block_2015-11-16-20-28-37-685034
[tiger] out: 2015-11-16 15:43:49,719 ERROR ----|vvv|----
[tiger] out: 2015-11-16 15:43:49,721 ERROR Error trying to get the number of jobs in the queue
[tiger] out: The error response reads: Traceback (most recent call last):
[tiger] out: File “/path/to/fireworks/fireworks/queue/queue_adapter.py”, line 59, in target
[tiger] out: self.process = subprocess.Popen(self.command, **kwargs)
[tiger] out: File “/usr/lib64/python2.7/subprocess.py”, line 710, in init
[tiger] out: errread, errwrite)
[tiger] out: File “/usr/lib64/python2.7/subprocess.py”, line 1327, in _execute_child
[tiger] out: raise child_exception
[tiger] out: TypeError: execv() arg 2 must contain only strings
[tiger] out:
[tiger] out: [u’squeue’, u’-o “%u”’, u’-u’, ‘mcahn’, u’-p’, None, u’-h’]
[tiger] out: {u’stderr’: -1, u’stdout’: -1}
[tiger] out: 2015-11-16 15:43:49,722 ERROR ----|^^^|----

Any guidance on using qlaunch with --remote_host, and on what might be going wrong in queue_adapter.py would be much appreciated.

Best,
Matthew

···

On Thursday, November 12, 2015 at 1:59:18 PM UTC-5, Matthew Cahn wrote:

Hi,
Could someone clarify for me how I should install Fireworks and MongoDB in the following situation? I have a standalone (that is, non-cluster) server that I’d like to use as my launch pad. This machine will also run some of the jobs in my workflow. I also have a large cluster which will run the other jobs in the workflow. The cluster consists of a login node and many compute nodes, and jobs are submitted from the login node to the compute nodes using slurm.
I’ve installed Fireworks on the standalone machine and run some of the tutorials, and that all works great. I’m not clear, however, on whether I need to install Fireworks and MongoDB on the login node of the cluster. Is one of these correct…

  1. qlaunch on the standalone machine will talk directly to slurm on the cluster login node (via ssh?), so Fireworks does not need to be installed on the cluster login node.
  2. qlaunch on the standalone machine will talk to Fireworks on the cluster login node, so Fireworks does need to be installed on the login node.

And in the second case do I also need MongoDB on the login node?

Any guidance would be appreciated.

Matthew

Hi Matthew

I did a quick Google search for the TypeError contents and this is the first hit:

http://stackoverflow.com/questions/20624342/typeerror-execv-arg-2-must-contain-only-strings

My guess is that the most immediate source of the error is that you cannot use None for -p, it must be a String (not None). Hopefully that allows you to make progress. Let me know if there is some bug in FWS that you think needs to be addressed.

···

On Mon, Nov 16, 2015 at 1:40 PM, Matthew Cahn [email protected] wrote:

Thanks Anubhav, this was very helpful. I now have MongoDB installed on the standalone machine, and Fireworks on the both the standalone machine and the cluster login node. I’m able to run jobs on the standalone machine, and on the cluster by using qlaunch on the login node in reservation/offline mode.

Now I’d like to ask if you could clarify what the --remote_host argument to qlaunch does. It seems like this should make it possible to run qlaunch on the standalone machine and have it execute jobs on the cluster. Using that I could do all the work of adding jobs to the workflow, and launching them, from the standalone machine. Is that correct?

I ran into a couple of issues trying to use --remote_host (-rh). On the standalone machine I ran this (“tiger” is the cluster login node):

qlaunch -rh tiger -rc -ru mcahn -r singleshot

First, I find that only rapidfire will work. singleshot complains that there are extra arguments to qlaunch. Something seems to add some arguments that are specific to rapidfire:

[tiger] run: qlaunch --reserve singleshot --maxjobs_queue None --maxjobs_block None --nlaunches None
[tiger] out: qlaunch: error: unrecognized arguments: --maxjobs_queue None --maxjobs_block None --nlaunches None

With rapidfire I get a complaint from queue_adapter.py. (In this example I’ve modified queue_adapter.py to print self.command and
kwargs). It seems to be complaining about the -1 (which is subprocess.PIPE) in kwargs:

qlaunch -rh tiger -rc -ru mcahn -r rapidfire

[tiger] out: 2015-11-16 15:43:49,699 INFO getting queue adapter
[tiger] out: 2015-11-16 15:43:49,701 INFO Found previous block, using /path/to/queue_tests/block_2015-11-16-20-28-37-685034
[tiger] out: 2015-11-16 15:43:49,719 ERROR ----|vvv|----
[tiger] out: 2015-11-16 15:43:49,721 ERROR Error trying to get the number of jobs in the queue
[tiger] out: The error response reads: Traceback (most recent call last):
[tiger] out: File “/path/to/fireworks/fireworks/queue/queue_adapter.py”, line 59, in target
[tiger] out: self.process = subprocess.Popen(self.command, **kwargs)
[tiger] out: File “/usr/lib64/python2.7/subprocess.py”, line 710, in init
[tiger] out: errread, errwrite)
[tiger] out: File “/usr/lib64/python2.7/subprocess.py”, line 1327, in _execute_child
[tiger] out: raise child_exception
[tiger] out: TypeError: execv() arg 2 must contain only strings
[tiger] out:
[tiger] out: [u’squeue’, u’-o “%u”’, u’-u’, ‘mcahn’, u’-p’, None, u’-h’]
[tiger] out: {u’stderr’: -1, u’stdout’: -1}
[tiger] out: 2015-11-16 15:43:49,722 ERROR ----|^^^|----

Any guidance on using qlaunch with --remote_host, and on what might be going wrong in queue_adapter.py would be much appreciated.

Best,
Matthew

On Thursday, November 12, 2015 at 1:59:18 PM UTC-5, Matthew Cahn wrote:

Hi,
Could someone clarify for me how I should install Fireworks and MongoDB in the following situation? I have a standalone (that is, non-cluster) server that I’d like to use as my launch pad. This machine will also run some of the jobs in my workflow. I also have a large cluster which will run the other jobs in the workflow. The cluster consists of a login node and many compute nodes, and jobs are submitted from the login node to the compute nodes using slurm.
I’ve installed Fireworks on the standalone machine and run some of the tutorials, and that all works great. I’m not clear, however, on whether I need to install Fireworks and MongoDB on the login node of the cluster. Is one of these correct…

  1. qlaunch on the standalone machine will talk directly to slurm on the cluster login node (via ssh?), so Fireworks does not need to be installed on the cluster login node.
  2. qlaunch on the standalone machine will talk to Fireworks on the cluster login node, so Fireworks does need to be installed on the login node.

And in the second case do I also need MongoDB on the login node?

Any guidance would be appreciated.

Matthew

You received this message because you are subscribed to the Google Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/4fedd98a-6cb6-4d5f-a8f5-66f4668b7a6e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.