Does anyone have experience running FireWorks workflows on Google Cloud Platform (GCP)?
The only thing I see online about FireWorks in any cloud is the MongoDB server. [It’s straightforward to install MongoDB on Google Compute Engine (GCE) or use Google’s managed MongoDB service.]
I’m helping a team port a FireWorks scientific workflow from a SLURM cluster to Google Compute Engine. So far, FireWorks seems quite adaptable and it partitions responsibilities very neatly for this.
Key differences about running in GCE:
- GCE doesn’t seem to have a suitable job queue. (There’s the App Engine task queue, the Cloud Run platform for load balancing among HTTP workers using up to 2GB RAM, Kubernetes Engine, and such, but nothing seems suitable.)
- I don’t think we need a job queue. You can create as many as workers as you want as GCE VMs (virtual machines). Each worker can rapidfire launch rockets, then shut down when idle for long enough.
- While you can set up an NFS shared file service, it costs literally 1/10 as much to use Google Cloud Storage (GCS) and GCS scales up better. But GCS is a blob store, not a file service, so e.g. you atomically read or write an entire file without support for multiple accessors.
- I think it’s best to fetch inputs from GCS, run a rocket, then store its outputs back to GCS. We’ve worked out our Firework inputs & outputs.
- The alternative is to use gcsfuse to mount a storage directory, but it has a bunch of caveats. For one thing, GCS doesn’t have directory nodes. If you create fake directory entries, which are empty files with names ending in
/
, that greatly speeds upgcsfuse
.
- The alternative is to use gcsfuse to mount a storage directory, but it has a bunch of caveats. For one thing, GCS doesn’t have directory nodes. If you create fake directory entries, which are empty files with names ending in
- Rather than trying to load each developer’s experiment application code and environment (pips, linux apts, config files, environment variables) onto a worker disk image before creating the workers, or mounting some of that via NFS or gcsfuse, a clean approach is to package the payload environment and application code into a Docker image. Each rocket would then run a Firetask that does this:
- pull a Docker image and start a container
- fetch needed input files from GCS
- map those files into the container’s file space
- run a command line in the container that runs the payload Firetask
- store output files to GCS
- delete the container
- delete the input & output files.
I’m building pieces and have yet to run into any problems, just a bunch to learn and implement. The code so far will create workers that launch rockets rapidfire from a requested DB then shut themselves down. It merrily runs tutorial Firetasks. Soon I’ll implement the Firetask that runs Docker payloads.
Questions:
- Is there any experience to build on?
- Is anyone interested in using this code? May I contribute it to the FireWorks community or to the FireWorks project?
- What’s the last Fireworks release fully tested on Python 2.7? (My aim is to complete cloud migration before Python 3 migration.)
- In the docs, does “FireServer” mean “the server running MongoDB”? Ditto for “LaunchPad”?
Happy fireworking!