Running FireWorks on Google Cloud Platform?

Does anyone have experience running FireWorks workflows on Google Cloud Platform (GCP)?

The only thing I see online about FireWorks in any cloud is the MongoDB server. [It’s straightforward to install MongoDB on Google Compute Engine (GCE) or use Google’s managed MongoDB service.]

I’m helping a team port a FireWorks scientific workflow from a SLURM cluster to Google Compute Engine. So far, FireWorks seems quite adaptable and it partitions responsibilities very neatly for this.

Key differences about running in GCE:

  • GCE doesn’t seem to have a suitable job queue. (There’s the App Engine task queue, the Cloud Run platform for load balancing among HTTP workers using up to 2GB RAM, Kubernetes Engine, and such, but nothing seems suitable.)
  • I don’t think we need a job queue. You can create as many as workers as you want as GCE VMs (virtual machines). Each worker can rapidfire launch rockets, then shut down when idle for long enough.
  • While you can set up an NFS shared file service, it costs literally 1/10 as much to use Google Cloud Storage (GCS) and GCS scales up better. But GCS is a blob store, not a file service, so e.g. you atomically read or write an entire file without support for multiple accessors.
  • I think it’s best to fetch inputs from GCS, run a rocket, then store its outputs back to GCS. We’ve worked out our Firework inputs & outputs.
    • The alternative is to use gcsfuse to mount a storage directory, but it has a bunch of caveats. For one thing, GCS doesn’t have directory nodes. If you create fake directory entries, which are empty files with names ending in /, that greatly speeds up gcsfuse.
  • Rather than trying to load each developer’s experiment application code and environment (pips, linux apts, config files, environment variables) onto a worker disk image before creating the workers, or mounting some of that via NFS or gcsfuse, a clean approach is to package the payload environment and application code into a Docker image. Each rocket would then run a Firetask that does this:
    • pull a Docker image and start a container
    • fetch needed input files from GCS
    • map those files into the container’s file space
    • run a command line in the container that runs the payload Firetask
    • store output files to GCS
    • delete the container
    • delete the input & output files.

I’m building pieces and have yet to run into any problems, just a bunch to learn and implement. The code so far will create workers that launch rockets rapidfire from a requested DB then shut themselves down. It merrily runs tutorial Firetasks. Soon I’ll implement the Firetask that runs Docker payloads.

Questions:

  • Is there any experience to build on?
  • Is anyone interested in using this code? May I contribute it to the FireWorks community or to the FireWorks project?
  • What’s the last Fireworks release fully tested on Python 2.7? (My aim is to complete cloud migration before Python 3 migration.)
  • In the docs, does “FireServer” mean “the server running MongoDB”? Ditto for “LaunchPad”?

Happy fireworking!

Hi,

I think it’s great that you’re considering running FireWorks on Google Cloud Platform. In response to your questions:

  1. Unfortunately, no. We have never run on GCP or even AWS and there are nothing planned for the foreseeable future.

  2. I think it would be a great contribution to the FireWorks codebase. However, I would unfortunately not really be able to help with testing / etc. as we don’t have anything with GCP planned.

  3. The last py2 test that seems to have passed is this one: https://circleci.com/gh/materialsproject/fireworks/2263
    So you could either use FWS v1.9.4 or check out commit: 6505cd52ca3a7f34b549e6a18c7e2d0d336aa066

I think most incompatibilities with Py3 are due to the underlying libraries and not due to Py3-specific code in FireWorks itself.

  1. Yes, the terminology used in the docs is unclear. I think I initially meant “FireServer” to refer to the physical server that hosts your MongoDB install / LaunchPad and LaunchPad as more the abstract object (e.g., one could imagine a LaunchPad with no external server at all, rather a flat file) - but might have interchanged the terms.

Let me know how it goes! Happy to answer any more questions that come up along the way as I am able.

1 Like