Limiting Red: Smarter Test Builds Through Metrics

The Current State of the Art

In the Ruby world there is a wealth of metrics which can provide insight into our code. Looking at such things as:

Structural similar code (Flay)
Complexity (Flog)
Cyclic complexity (Saikuro)
Code smells (Reek)
Design issues (Roodi)
File change frequencies (Churn)

When it comes to metrics involving our tests we have:

Code coverage (Rcov)
Tools to help identify missed edge cases (Heckle).
Random Testing tools (RushCheck)

Is that it? I think we can do better than that!

What useful metrics are we missing that our tests could provide and what should we be recording?

Recording Test Builds

Your using a Continuous Integration server right? Running all your tests at every checkin in your source control repository. The CI environment represents our pipeline in which all code needs to flow through. It tends to be the place where all of the tests are run before the code flows into the outside world. Hence this is a perfect environment to start capturing detailed metrics about all of our tests. It’s also not the end of the world if we add a little extra time to the test build in-order to capture these metrics.

Mining Metrics from Test Builds

What interesting things can we discover? Here are some suggestions:

Failure rates
- Areas of your product which are prone to failure/bugs and tests which might be fragile. Perhaps highlighting area QA’s should focus extra attention to.
Flickering tests
- If a test keeps failing and passing frequently.
Fragile Tests
- An all or nothing feature where all the tests fail or none fail.
Never failing tests
- Tests which have never failed, do we need to run them all the time, are they now redundant?
Average build failures a day
- How often is the build broken.
Discover Shotgun Surgery
- Small code changes broke all the tests!

What other metrics do you think would be useful?

Kent Beck is Smart

Kent Beck has some additional ideas, lets copy him and pretend to look smart.

Intelligent Selection of the Tests to Run

Kent Beck wrote a tool called JUnit Max which is a plugin for Eclipse and JUnit which helps programmers stay focused on coding by running tests intelligently.

“Max fails fast, running the tests most likely to fail first.”

One of the key principles behind this tool is that:

“Tests that failed recently are more likely to fail than tests which have never failed.”

Super Fast Feedback

If we prioritise the tests that failed recently and those which have been recorded as being likely to fail we increase the chance that a failure occurs early on in the test build. The closer the distance between pushing the code and knowing there is a fail the better.

One problem this helps alleviate is when a test fails 99% of the way through the build. To know you’re fix worked you have to sit and wait for the entire build to run.

CukeMax (alpha-1)

CukeMax is a project that aims to:

Provide a web service to record Cucumber test builds
Provide a web based interface to uncover juicy metrics about your tests.
Feed recorded metrics back into the running of tests prioritising those most likely to fail.
Cool stuff

CukeMax is intended to be used when you run your tests on your CI server. While this initial version just supports Cucumber there is no reason why it cannot be expanded to other test tools such as Rspec. I’m already using this for my own projects and I have a special version working at Songkick.com HQ.

Wanna Play?

You can browser around an example of the web interface at CukeMax - www.cukemax.com

Want to be one of the first Guinea pigs to try out CukeMax? Let me know.

The client tool will be leaked slowly into the world to ensure we can balance server load.

Whats next?

All I can say is there is a lot of activity around this project with some exciting tools in the pipeline

Also Matt Wynne has been working on some similar ideas and we are discussing if we can combine our thoughts.

Joseph Wilk

Things with code, creativity and computation.