How to test software that uses external command line tools
I am trying to figure out how to use test software that launches external processes that take file paths as input and write output after long processing to stdout or some file? Are there common patterns when writing tests in situations like this? It is difficult to create quick test cases that can verify the correct use of external tools without running real tools in the tests and checking the results.
a source to share
You can memoize ( http://en.wikipedia.org/wiki/Memoization ) external processes. Write a Ruby wrapper that calculates the md5 sum of an input file and checks it against a database of known checksums. If it matches one, copy the correct output; otherwise, call the tool as usual.
a source to share
Test to your limits. In your case, the border is the command line that you create to invoke an external program (which you can capture by patching the monkeys). If you glue yourself to that stdout program (or process its output by reading files), that's a different boundary. The test is whether your program can handle this "enter".
a source to share
The 90% answer would be to mock the external command line tools and make sure the right input is passed to them on the separating interface between them. This helps to quickly install the test package. Also, you don't need to type in command line tools, as they are not "your code under test" - this causes the unit test to fail either due to changes in your code or some changes to the command line utility.
But it looks like you're having trouble figuring out "correct input" - in which case, using optimizations like Memoization (as Dave suggests) can give you the best of both worlds.
a source to share
I think you are running into a common problem with unit testing, since correctness is really determined by the fact that the integration works, and how does unit test help?
The main answer is that unit test tests show that the parameters you intend to pass to the command line tool are actually passed that way, and that the results you expect to get are actually handled the way you plan to handle them.
Then there is a second level of tests, which may be automatic (or perhaps not automatic) (desirable, but it depends on how practical it is), which are at the functional level, where the real utilities are called so that you can see that you are going to convey, and what you expect, return to what is actually happening.
Also, there would be nothing wrong with a test suite that "checks" external tools (which may be running on a different schedule or only when those tools are updated) that set your assumptions by passing in the original input and claiming that you are returning the original result. This way, if you update the tool, you can catch any behavior changes that might affect you.
You must decide whether to use this latest test suite or not. It very much depends on the tools involved.
a source to share