Limiting Red: Smarter Test Builds Through Metrics

15 Aug

The Current State of the Art

In the Ruby world there is a wealth of metrics which can provide insight into our code. Looking at such things as:

When it comes to metrics involving our tests we have:

  • Code coverage (Rcov)
  • Tools to help identify missed edge cases (Heckle).
  • Random Testing tools (RushCheck)

Is that it? I think we can do better than that!

What useful metrics are we missing that our tests could provide and what should we be recording?

Recording Test Builds

Your using a Continuous Integration server right? Running all your tests at every checkin in your source control repository. The CI environment represents our pipeline in which all code needs to flow through. It tends to be the place where all of the tests are run before the code flows into the outside world. Hence this is a perfect environment to start capturing detailed metrics about all of our tests. It’s also not the end of the world if we add a little extra time to the test build in-order to capture these metrics.

Mining Metrics from Test Builds

What interesting things can we discover? Here are some suggestions:

  • Failure rates
    • Areas of your product which are prone to failure/bugs and tests which might be fragile. Perhaps highlighting area QA’s should focus extra attention to.
  • Flickering tests
    • If a test keeps failing and passing frequently.
  • Fragile Tests
    • An all or nothing feature where all the tests fail or none fail.
  • Never failing tests
    • Tests which have never failed, do we need to run them all the time, are they now redundant?
  • Average build failures a day
    • How often is the build broken.
  • Discover Shotgun Surgery
    • Small code changes broke all the tests!

What other metrics do you think would be useful?

Kent Beck is Smart

Kent Beck has some additional ideas, lets copy him and pretend to look smart.

Intelligent Selection of the Tests to Run

Kent Beck wrote a tool called JUnit Max which is a plugin for Eclipse and JUnit which helps programmers stay focused on coding by running tests intelligently.

Max fails fast, running the tests most likely to fail first.

One of the key principles behind this tool is that:

“Tests that failed recently are more likely to fail than tests which have never failed.”

Super Fast Feedback

If we prioritise the tests that failed recently and those which have been recorded as being likely to fail we increase the chance that a failure occurs early on in the test build. The closer the distance between pushing the code and knowing there is a fail the better.

One problem this helps alleviate is when a test fails 99% of the way through the build. To know you’re fix worked you have to sit and wait for the entire build to run.

CukeMax (alpha-1)

CukeMax is a project that aims to:

  • Provide a web service to record Cucumber test builds
  • Provide a web based interface to uncover juicy metrics about your tests.
  • Feed recorded metrics back into the running of tests prioritising those most likely to fail.
  • Cool stuff

CukeMax is intended to be used when you run your tests on your CI server. While this initial version just supports Cucumber there is no reason why it cannot be expanded to other test tools such as Rspec. I’m already using this for my own projects and I have a special version working at Songkick.com HQ.

Wanna Play?

You can browser around an example of the web interface at CukeMax - www.cukemax.com

Want to be one of the first Guinea pigs to try out CukeMax? Let me know.

The client tool will be leaked slowly into the world to ensure we can balance server load.

Whats next?

All I can say is there is a lot of activity around this project with some exciting tools in the pipeline

Also Matt Wynne has been working on some similar ideas and we are discussing if we can combine our thoughts.

  • If you are still interested Jeroen drop me an email and I'll give you access to the client gem. Thanks!
  • I don't think there is a hard and fast rule. If a bug is reported in Scenario form I'm quite likely to use Cucumber to drive out that bug. Otherwise I tend to use features/scenarios for bugs if they are complex, hard to describe or need some level of discussion with clients. Most of the time bugs can be associated to a Feature not behaving in the correct manner (And hence not realising its value). So often these bug examples might end being attached to an existing feature. If I have a bug I know exactly how to fix or I deem it too expensive to test at the acceptance test level I tend to drop down to a spec rather than use features. Hope that provides some insight!
  • As Joe mentions, As Joe mentions in his blog post, we've both been working on a very similar tool, though neither of us realised it until we met up last week!

    I’m hoping Joe and I can figure out a way to combine our efforts into something amazing. In the mean time, if you'd like to see what I've been up to, check out this blog post: http://blog.mattwynne.net/2010...

    Or this screencast: http://www.youtube.com/watch?v...
  • I would love for this to support lots of platforms and environments so its definitely something we will be working towards.
  • Thanks for spotting that. As for this becoming a commercial web-service it really depends on demand and the amount of hardware that is required to support that demand. While it's very early days it is a goal to provide free access for open source projects.
    Anyone thats shown an interest I'll keep you posted. Thanks!
  • Michael Heinrich
    Hello Joseph,

    CukeMax sounds great! I too would like to try this out. Are there any plans to branch CukeMax out to other platforms/environments? (Work on Microsoft .NET only.)

    Thank you!
  • joahking
    good post, thanks. One a-bit-off-topic question though: in http://www.cukemax.com/project... I see you have features/bugs/ directory with features in it.
    What's your take on this? Are you handling bugs as things to show to your clients? I normally (for speed reasons) even write "security, you cannot access" features but I am not 100% sure where they fit in the full features suite... Are these issues (bugs, access-denied rules) to be written in cucumber? what's your opinion on this? how do you do it?
  • Jeroentjevandijk
    Very cool Joseph! I would very much like to try this out. Will it be a commercial webservice in the end? Looking forward to where this is going!

    Btw, a small note, the link www.cukemake.com is missing "http://" and therefore not working correctly
blog comments powered by Disqus