Metrics for Plain Text Acceptance Tests

There has been lots of activity around the value of metrics for source code and tests. In the Ruby world tools like metric_fu provide a wealth of analysis.

While working on my Cucumber talk for Rails Underground I started investigating how we could apply metrics to the customer focused plain text of Cucumber. For those not familiar with Cucumber it’s an acceptance testing framework which allows non-technical people to write plain-text describing the behaviors of their system. The developers/testers map the plain-text to tests.

Having spent time teaching people about the plain-text side of Cucumber I often found myself recommending the same guidelines and plain-text anti-patterns. This lead me to think about providing metrics scoring the customers plain-text.

Why would we want plain-text acceptance test metrics?

Help plain-text beginners avoid bad practices early on.
Help improve the quality of plain-text
Help quality review with a large frequency of incoming features

Why does the quality of the plain-text matter?

Why focus on quality, the plain-texts primarily goal is to be easy for customer to use?

The developer builds the Domain specific language via mapping plain text to ruby. Higher quality plain-text could make it easier to manage these mappings without any major impact to readability.
Higher quality text is easier to read, edit and understand.

Who would find it useful?

Initially Developers.

In some scenarios the developers write the features from discussions and give to the customer to review.
Developers may tweak/review customer written changes/features.
Developers often edit/tweak plain-text from the customer to enable reuse of existing test code .
In open source projects often developers write Cucumber features. Metrics are something they are comfortable with.

Can you measure quality in plain-text?

First its important to distinguish acceptance tests from pure plain-text. Within acceptance tests we have some degree of structure, for example using Given/When/Then to describe scenarios.

Cucumber Example:

Scenario: Eating all cucumbers
  Given there are 5 cucumbers
  When I eat 5 cucumbers
  Then I should have 0 cucumbers

This structure reduces the complexity of analysing the quality of the text. It provides us with different structural elements which have different rules/guidelines on what their content should be.

The problem with measuring the quality of text is that it is far more subjective in than in code. So while we cannot be absolute in our assessment of quality we can try and codify smells that could indicate areas in the text that could be improved. This is pretty much true for all metrics, they are guidelines not absolutes (Dan Norths highlights the dangers of absolute metrics in the Parable of Metrics)

So what useful metrics could we look at?

Plain text Metrics

From my experience with Cucumber I would suggest examining:

Lack of Feature/Story Narrative
- We don’t know why this feature is useful and who it’s useful to!
Scenario length
- Very long scenarios may reflect more abstraction should be used to help make the scenarios easier to read (See declarative vs imperative scenarios: http://www.benmabey.com/2008/05/19/imperative-vs-declarative-scenarios-in-user-stories/)
Size of Background (like a before/setup)
- Increases conceptual overhead for each scenario. More reasons here: http://wiki.github.com/aslakhellesoy/cucumber/background
Semantically close words across features
- Can indicate ambiguity around domain terminology
Frequency of noise words (‘basically’, ‘be able to’, etc)
- Adds no value and drives me crazy.

Feedback

What do you think of the idea?

Can you think of any other useful plain-text metrics?

Joseph Wilk

Things with code, creativity and computation.

Metrics for Plain Text Acceptance Tests

Plain text Metrics

Feedback

Comments