Workflow tests

bocklund · March 29, 2017, 2:49pm

I think that we should try to phase out testing entire workflows. These are mostly integration tests, are quite slow and IMO not very helpful.

Looking at the existing workflow tests (atomate/vasp/workflows/tests), the following things are actually being tested:

The parent links are set up correctly by the preset or base wf and the wf runs in the order it should. This is tested in general by the test_vasp_workflows module. Individual workflows can check that the dependency structure of the DAG is correct just by a unit test of a created workflow. You don’t have to actually run it.
Powerups and other spec modifications apply correctly. These should also be a part of fast unit tests for both the powerups and introspection of workflow objects.
We can connect to the database and get out the values we intended to enter. We could just test the *ToDbTask Firetasks and put the data into a dictionary and check that dictionary
The inputs are written correctly. Sometimes this is from a powerup, which is addressed above. For customizations of the incar settings or VASP input set, those can be tested just by running tests on the dict. We should have some tests that ensure we are calling pymatgen correctly in the write_inputs tasks and don’t need to test so many individual cases at the workflow level.
Check that the right things are being parsed and the correct results are obtained. Unit tests again.
Other reasons not to have workflow tests (some repeated):
They are slow. The 4 workflow tests took 82% of the total test time on my computer (89s of 108 total seconds on average).
They don’t test anything that couldn’t be tested in smaller unit tests that fail faster and give clearer indications to the real problem
They are hard to debug. Unless the fix it is obvious, you have to turn debug mode on and rerun (the slow test), look at the outputs and sometimes fish around in the database. Sometimes a setting is wrong in the middle of the workflow and you have to back up multiple levels to the Firetask that was given the wrong settings.
They require input and output files that add to the size and maintenance. The entire atomate project is 53% tests (77MB while the workflow tests are 38MB) ( (24% when including all of the ./git history, which test files contribute to)
Half of the code in each test is the same boilerplate to set up the test, run fake VASP, and access the database (code smell that we can do better)

Good examples for what to do include what Kiran did with the Gibbs quasiharmonic Debye model. Even though the analysis code is mostly in pymatgen now, those analysis tests were tested as unit tests and verified that the inputs and outputs were correct.

We should keep the one integration (full workflow) test, test_vasp_workflows.py, but having one test per workflow doesn’t make sense in my opinion. Instead, we should focus more on writing small tests no higher than the individual Firework level, but mostly on the Firetask level or analysis level. If Fireworks tests are passing, we should already know that atomate workflows will run correctly unless they are designed incorrectly at creation (you shouldn’t need to run them to find out if they work). There are some exceptions for features like pass_calc_locs, which would deserve it’s own tests, but even these are isolated and don’t need to be effectively tested over and over again in workflow tests.

What do you all think? Has anyone found bugs with workflow tests? Have you ever written code that broke (or might have broken) a workflow test?

Anubhav_Jain · March 30, 2017, 1:33am

Hi Brandon,

Thanks for this thoughtful comments. I’ll give these some thought soon and be in touch with more detailed comments. I agree that some of this is getting out of control and we need to do something.

Best,

Anubhav

···

On Wed, Mar 29, 2017 at 7:49 AM, Brandon B [email protected] wrote:

I think that we should try to phase out testing entire workflows. These are mostly integration tests, are quite slow and IMO not very helpful.

Looking at the existing workflow tests (atomate/vasp/workflows/tests), the following things are actually being tested:

The parent links are set up correctly by the preset or base wf and the wf runs in the order it should. This is tested in general by the test_vasp_workflows module. Individual workflows can check that the dependency structure of the DAG is correct just by a unit test of a created workflow. You don’t have to actually run it.

Powerups and other spec modifications apply correctly. These should also be a part of fast unit tests for both the powerups and introspection of workflow objects.

We can connect to the database and get out the values we intended to enter. We could just test the *ToDbTask Firetasks and put the data into a dictionary and check that dictionary

The inputs are written correctly. Sometimes this is from a powerup, which is addressed above. For customizations of the incar settings or VASP input set, those can be tested just by running tests on the dict. We should have some tests that ensure we are calling pymatgen correctly in the write_inputs tasks and don’t need to test so many individual cases at the workflow level.

Check that the right things are being parsed and the correct results are obtained. Unit tests again.
Other reasons not to have workflow tests (some repeated):

They are slow. The 4 workflow tests took 82% of the total test time on my computer (89s of 108 total seconds on average).

They don’t test anything that couldn’t be tested in smaller unit tests that fail faster and give clearer indications to the real problem

They are hard to debug. Unless the fix it is obvious, you have to turn debug mode on and rerun (the slow test), look at the outputs and sometimes fish around in the database. Sometimes a setting is wrong in the middle of the workflow and you have to back up multiple levels to the Firetask that was given the wrong settings.

They require input and output files that add to the size and maintenance. The entire atomate project is 53% tests (77MB while the workflow tests are 38MB) ( (24% when including all of the ./git history, which test files contribute to)

Half of the code in each test is the same boilerplate to set up the test, run fake VASP, and access the database (code smell that we can do better)

Good examples for what to do include what Kiran did with the Gibbs quasiharmonic Debye model. Even though the analysis code is mostly in pymatgen now, those analysis tests were tested as unit tests and verified that the inputs and outputs were correct.

We should keep the one integration (full workflow) test, test_vasp_workflows.py, but having one test per workflow doesn’t make sense in my opinion. Instead, we should focus more on writing small tests no higher than the individual Firework level, but mostly on the Firetask level or analysis level. If Fireworks tests are passing, we should already know that atomate workflows will run correctly unless they are designed incorrectly at creation (you shouldn’t need to run them to find out if they work). There are some exceptions for features like pass_calc_locs, which would deserve it’s own tests, but even these are isolated and don’t need to be effectively tested over and over again in workflow tests.

What do you all think? Has anyone found bugs with workflow tests? Have you ever written code that broke (or might have broken) a workflow test?

–

You received this message because you are subscribed to the Google Groups “atomate” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/d/msgid/atomate/441988dd-6a02-44d9-8513-afb811763a7e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

–

Best,
Anubhav

Anubhav_Jain · April 3, 2017, 10:29pm

Hi Brandon,

I thought this over a bit more. Here are my thoughts on the matter.

Basically, the major advantage to an integration test is that it will really test the “glue” tasks of data passing between the Fireworks. e.g., if one made a change to the glue tasks (e.g., the data format that was passed) and updated a unit test specific to that glue task, the full workflow test may be the only thing to pinpoint that this will end up breaking certain workflows that depend on that glue task. Apart from the PassCalcLocs mentioned in your message, there are other complex data passing tasks for things like elastic workflows. Note that in response to the “these don’t need to be tested over and over”, it really depends. For example, let’s pretend there are no integration tests and the data format produced by PassCalcLocs is tested only once in a unit test for PassCalcLocs. Now say someone wants to redo the data format for PassCalcLocs. That person is reasonably sure that most workflows won’t be affected by the update, so he/she makes that update and also updates the unit test for PassCalcLocs to reflect the new data format. Now all the unit tests pass, but it’s possible one of the workflows breaks. You won’t know until you actually run through it. Or you can send out a mass email to every workflow person that used PassCalcLocs or carefully analyze the code for each of the workflows to make sure you are not breaking anything, but an integration test takes some of the human factor out of it.

Another minor advantage is that it really tests the whole system, including for example the underlying FireWorks library. For example, it tests to make sure the serialization / deserialization of objects in to the FWS database is correct.

As for reasons to not have them:

They are slow. The 4 workflow tests took 82% of the total test time on my computer (89s of 108 total seconds on average).

Honestly, 108 secs is not too bad IMO. I also can’t think of any reason why these can’t be faster, or even why the fact that it’s an integration test vs unit test that’s causing the slower speed. As far as I can tell it basically the Raman test that eats up pretty much all the time. It seems to not be a problem with integration tests but something going on with the Raman test, which would likely also be the same issue if the Raman test was coded as a unit test for the same functionality. Is this also what you are seeing?

They don’t test anything that couldn’t be tested in smaller unit tests that fail faster and give clearer indications to the real problem

I agree that unit tests give clearer indications of the problem. The integration tests can be a pain to debug when they fail.

They are hard to debug.

Yes I agree. In some cases, the actual problem is indeed tricky and the difficulty of debugging is somewhat justified. In other cases, the problem is simple and the debug time should have been a lot shorter, and would have been if it were a unit test.

They require input and output files that add to the size and maintenance.

Yes, I am not a fan of all these files, but again I don’t think this is really an integration test problem. Even if we were doing unit tests, we’d need example input and output files.

The big issue is more that some workflows like elastic have 6 deformations. If we were doing unit tests, we’d only test 1 of them for inputs and outputs. The integration test has all 6. This can result in a stronger test if done correctly, but also adds a ton of files. I would prefer if these tests, even if they were integration tests, to really test just 1 of them and fake the analysis step of the remaining 5 (so that output files are not needed).

Half of the code in each test is the same boilerplate to set up the test, run fake VASP, and access the database (code smell that we can do better)

Ok but we can clean this up. It’s not too bad IMO.

Of course the real answer is that unit tests and integration tests were never meant to be exclusive. A good end-to-end system typically has both and one doesn’t exclude the other. Given that that is the “real” answer I have a hard time saying the integration tests should be obliterated. But I may also have a hard time motivating people to write both.

Best

Anubhav

···

On Wednesday, March 29, 2017 at 6:34:28 PM UTC-7, Anubhav Jain wrote:

Hi Brandon,

Thanks for this thoughtful comments. I’ll give these some thought soon and be in touch with more detailed comments. I agree that some of this is getting out of control and we need to do something.

Best,

Anubhav

On Wed, Mar 29, 2017 at 7:49 AM, Brandon B [email protected] wrote:

I think that we should try to phase out testing entire workflows. These are mostly integration tests, are quite slow and IMO not very helpful.

Looking at the existing workflow tests (atomate/vasp/workflows/tests), the following things are actually being tested:

The parent links are set up correctly by the preset or base wf and the wf runs in the order it should. This is tested in general by the test_vasp_workflows module. Individual workflows can check that the dependency structure of the DAG is correct just by a unit test of a created workflow. You don’t have to actually run it.

Powerups and other spec modifications apply correctly. These should also be a part of fast unit tests for both the powerups and introspection of workflow objects.

We can connect to the database and get out the values we intended to enter. We could just test the *ToDbTask Firetasks and put the data into a dictionary and check that dictionary

The inputs are written correctly. Sometimes this is from a powerup, which is addressed above. For customizations of the incar settings or VASP input set, those can be tested just by running tests on the dict. We should have some tests that ensure we are calling pymatgen correctly in the write_inputs tasks and don’t need to test so many individual cases at the workflow level.

Check that the right things are being parsed and the correct results are obtained. Unit tests again.
Other reasons not to have workflow tests (some repeated):

They are slow. The 4 workflow tests took 82% of the total test time on my computer (89s of 108 total seconds on average).

They don’t test anything that couldn’t be tested in smaller unit tests that fail faster and give clearer indications to the real problem

They are hard to debug. Unless the fix it is obvious, you have to turn debug mode on and rerun (the slow test), look at the outputs and sometimes fish around in the database. Sometimes a setting is wrong in the middle of the workflow and you have to back up multiple levels to the Firetask that was given the wrong settings.

They require input and output files that add to the size and maintenance. The entire atomate project is 53% tests (77MB while the workflow tests are 38MB) ( (24% when including all of the ./git history, which test files contribute to)

Half of the code in each test is the same boilerplate to set up the test, run fake VASP, and access the database (code smell that we can do better)

Good examples for what to do include what Kiran did with the Gibbs quasiharmonic Debye model. Even though the analysis code is mostly in pymatgen now, those analysis tests were tested as unit tests and verified that the inputs and outputs were correct.

We should keep the one integration (full workflow) test, test_vasp_workflows.py, but having one test per workflow doesn’t make sense in my opinion. Instead, we should focus more on writing small tests no higher than the individual Firework level, but mostly on the Firetask level or analysis level. If Fireworks tests are passing, we should already know that atomate workflows will run correctly unless they are designed incorrectly at creation (you shouldn’t need to run them to find out if they work). There are some exceptions for features like pass_calc_locs, which would deserve it’s own tests, but even these are isolated and don’t need to be effectively tested over and over again in workflow tests.

What do you all think? Has anyone found bugs with workflow tests? Have you ever written code that broke (or might have broken) a workflow test?

–

You received this message because you are subscribed to the Google Groups “atomate” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/d/msgid/atomate/441988dd-6a02-44d9-8513-afb811763a7e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

–
Best,
Anubhav