Thanks Anubhav, this was very helpful. I now have MongoDB installed on the standalone machine, and Fireworks on the both the standalone machine and the cluster login node. I’m able to run jobs on the standalone machine, and on the cluster by using qlaunch on the login node in reservation/offline mode.
Now I’d like to ask if you could clarify what the --remote_host argument to qlaunch does. It seems like this should make it possible to run qlaunch on the standalone machine and have it execute jobs on the cluster. Using that I could do all the work of adding jobs to the workflow, and launching them, from the standalone machine. Is that correct?
I ran into a couple of issues trying to use --remote_host (-rh). On the standalone machine I ran this (“tiger” is the cluster login node):
qlaunch -rh tiger -rc -ru mcahn -r singleshot
First, I find that only rapidfire will work. singleshot complains that there are extra arguments to qlaunch. Something seems to add some arguments that are specific to rapidfire:
[tiger] run: qlaunch --reserve singleshot --maxjobs_queue None --maxjobs_block None --nlaunches None
[tiger] out: qlaunch: error: unrecognized arguments: --maxjobs_queue None --maxjobs_block None --nlaunches None
With rapidfire I get a complaint from queue_adapter.py. (In this example I’ve modified queue_adapter.py to print self.command and
kwargs). It seems to be complaining about the -1 (which is subprocess.PIPE) in kwargs:
qlaunch -rh tiger -rc -ru mcahn -r rapidfire
[tiger] out: 2015-11-16 15:43:49,699 INFO getting queue adapter
[tiger] out: 2015-11-16 15:43:49,701 INFO Found previous block, using /path/to/queue_tests/block_2015-11-16-20-28-37-685034
[tiger] out: 2015-11-16 15:43:49,719 ERROR ----|vvv|----
[tiger] out: 2015-11-16 15:43:49,721 ERROR Error trying to get the number of jobs in the queue
[tiger] out: The error response reads: Traceback (most recent call last):
[tiger] out: File “/path/to/fireworks/fireworks/queue/queue_adapter.py”, line 59, in target
[tiger] out: self.process = subprocess.Popen(self.command, **kwargs)
[tiger] out: File “/usr/lib64/python2.7/subprocess.py”, line 710, in init
[tiger] out: errread, errwrite)
[tiger] out: File “/usr/lib64/python2.7/subprocess.py”, line 1327, in _execute_child
[tiger] out: raise child_exception
[tiger] out: TypeError: execv() arg 2 must contain only strings
[tiger] out:
[tiger] out: [u’squeue’, u’-o “%u”’, u’-u’, ‘mcahn’, u’-p’, None, u’-h’]
[tiger] out: {u’stderr’: -1, u’stdout’: -1}
[tiger] out: 2015-11-16 15:43:49,722 ERROR ----|^^^|----
Any guidance on using qlaunch with --remote_host, and on what might be going wrong in queue_adapter.py would be much appreciated.
Best,
Matthew
···
On Thursday, November 12, 2015 at 1:59:18 PM UTC-5, Matthew Cahn wrote:
Hi,
Could someone clarify for me how I should install Fireworks and MongoDB in the following situation? I have a standalone (that is, non-cluster) server that I’d like to use as my launch pad. This machine will also run some of the jobs in my workflow. I also have a large cluster which will run the other jobs in the workflow. The cluster consists of a login node and many compute nodes, and jobs are submitted from the login node to the compute nodes using slurm.
I’ve installed Fireworks on the standalone machine and run some of the tutorials, and that all works great. I’m not clear, however, on whether I need to install Fireworks and MongoDB on the login node of the cluster. Is one of these correct…
- qlaunch on the standalone machine will talk directly to slurm on the cluster login node (via ssh?), so Fireworks does not need to be installed on the cluster login node.
- qlaunch on the standalone machine will talk to Fireworks on the cluster login node, so Fireworks does need to be installed on the login node.
And in the second case do I also need MongoDB on the login node?
Any guidance would be appreciated.
Matthew