Joseph Wilk

Joseph Wilk

Things with code, creativity and computation.

Specing Cucumber Step Definitions

Testing your tests is kind of crazy. However when writing a library of Cucumber step definitions which will be used in many projects it started to make sense to test my tests.

  • The step definitions are the code.

  • It helps reduce fear of breaking lots of projects which use the steps.

  • The tests/specs show examples of how to use the step definitions.

It is important to note that I’m not imply TDD/BDDing these step definitions. My use-case is adding tests afterwards when it comes time to extract them to a library.

How to test step definitions

Exercise the full step (with Rspec)

The common way of testing complex steps is to extract all the ruby from the step definitions and then just test that. But this way of testing does not exercise the step definitions from the outside, getting as close as possible to how they will be used. It also does not provide examples of how to use the step definitions.

So I sat down with Matt Wynne, who started this discussion and we thrashed out some Rspec macros for testing whole step definitions.

If we were testing this step definition (icalendar_steps.rb):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
require 'icalendar'

module Cucumber
  module Stepdefs
    module Icalendar
      def response_calendars
        ::Icalendar.parse(response.body)
      end

      def response_events
        response_calendars.length.should == 1
        response_calendars.first.events
      end
    end
  end
end

Before('@ical') do
  extend Cucumber::Stepdefs::Icalendar
end

Then /^the iCalendar should have exactly (\d+) events?$/ do |number_of_events|
  response_events.length.should == number_of_events.to_i
end

Our spec would look like this (Note: we test the Before hook as well as the step definition):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
describe 'icalendar_steps' do
  step_file File.dirname(__FILE__) + '/../../../lib/cucumber/stepdefs/icalendar_steps'

  # Test that the Before hook is not called when there is no tag
  without_tags do
    it "should not mix in any calendar related methods" do
      world_methods.should_not include('response_calendars')
      world_methods.should_not include('response_events')
    end
  end

  # Test the Before hook mixes in the right methods when tagged with @ical
  with_tag '@ical' do
    ['response_calendars', 'response_events'].each do |method|
      it "should add the #{method} to world" do
        world_methods.should include(method)
      end
    end

    the_step "the iCalendar should have exactly 1 event" do
      describe "when 1 calendar with 0 events is in the response body" do
        before(:each) do
          world.stub!(:response).and_return(mock_response_with_0_events)
        end

        it_should_fail_with(Spec::Expectations::ExpectationNotMetError)
      end

      describe "when 1 calendar with 1 event is in the response body" do
        before(:each) do
          world.stub!(:response).and_return(mock_response_with_1_event)
        end

        it_should_pass
      end
    end
  end
end

Experiment’s Source code

You can see the source on Github:

git clone git://github.com/mattwynne/cucumber-step_definitions.git

If this experiment proves successful these macros will make their way to a nice gem.

Pairwise Testing With Cucumber

Combinatorial testing is a difficult problem. Having to test a small number of inputs  can result in a combinatorial explosion of possible permutations. In Cucumber we see this problem in Scenario Outlines where we can have a large number of rows for the Examples table.

We want to reduce the combinations to a more manageable size while still providing effective fault detection.

Pairwise testing provides one method of achieving this. However providing effective fault detection is dependent on the suitability of Pairwise to the data/system, it is not guaranteed.

What is Pairwise testing?

Pairwise testing (also called All-pairs testing or 2-way testing) is a way of generating a test suite which covers all input combinations of two and is therefore much smaller than exhaustive suites.

To put this in perspective a system of 72 binary inputs with Pairwise testing would require 28 combinations. Exhaustive testing would require 37,778,931,862,957,161,709,568 combinations!

Sounds great, but how does the method preserve good defect detection?

Pairwise relies on a simple principle:

“most faults are caused by interactions of at most two factors”The Combinatorial Design Approach to Automatic Test Generation

Pairwise focuses on the minimal set of inputs (1 or 2 interactions) that cover the most likely causes of faults.

So whats the catch?

Pairwise does not guarantee coverage of all faults of at most two factors. It only provides coverage of faults reachable from the inputs and values you select when generating the Pairwise set. Also the general principle that faults are caused by at most two factors does not guarantee that your data sets faults will follow that distribution. And hence Pairwise testing might miss a proportion of faults.

Pairwise Cucumber example

Here is a scenario taken from a web based system which deals with sports events.

Scenario Outline: Visiting events with another events media
  Given I have a <event without media>
  And I have a <media item> attached to <event with media>
  When I go to the <media item> page for the <event without media>
  Then I should be redirected
  And I should see the <media item> in the <event with media>
  Examples:
    |media item|event without media|event with an media|
    ...

The permutations for the Examples table cells are:

media item: [Image, Video, Music]
event with media: [Football, Basketball, Soccer]
event without media: [Football, Basketball, Soccer]

There are a total of 27 possible permutations. So lets see what permutations Pairwise testing would suggest.

To generate the combinations I used a Ruby based tool I’ve written called Pairwise which uses the in-parameter-order Pairwise generation strategy (http://ranger.uta.edu/~ylei/paper/ipo-tse.pd).

This tool outputs a table ready to be used in a Cucumber feature:

 | media item | event without media | event with media |
 | Image      | Football            | Football         |
 | Image      | Basketball          | Basketball       |
 | Image      | Soccer              | Soccer           |
 | Video      | Football            | Soccer           |
 | Video      | Basketball          | Football         |
 | Video      | Soccer              | Basketball       |
 | Music      | Football            | Basketball       |
 | Music      | Basketball          | Soccer           |
 | Music      | Soccer              | Football         |

That’s 11 permutations covering all possible input pairs.

Giving the final Scenario:

Scenario Outline: Visiting events with another events media
  Given I have a <event without media>
  And I have a <media item> attached to <event with media>
  When I go to the <media item> page for the <event without media>
  Then I should be redirected
  And I should see the <media item> in the <event with media>
Examples:
  | media item | event without media | event with media |
  | Image      | Football            | Football         |
  | Image      | Basketball          | Basketball       |
  | Image      | Soccer              | Soccer           |
  | Video      | Football            | Soccer           |
  | Video      | Basketball          | Football         |
  | Video      | Soccer              | Basketball       |
  | Music      | Football            | Basketball       |
  | Music      | Basketball          | Soccer           |
  | Music      | Soccer              | Football         |

Selecting the right values to test with

In Cucumber we encounter plain text steps which use high level language to hide low level details that are not important for the scenario. For example:

Given a default configuration

By abstracting and pushing the data out of the plain text we are also hiding the values from any Pairwise generation. It is therefore important to realise there will be sets of uncovered pairs and hence potential faults missed.

In deciding what is exposed in the Scenario Outline (as columns in the Examples table) you are selecting what inputs your want to focus the Pairwise generation around. And in turn the input space you want to limit the search for faults to.

When Pairwise testing fails

Pairwise testing is not suitable for all data sets. With any technique that throws away data examples based on a pattern you loose some degree of resolution and you may misfit your data missing important faults.

Here are some cases where Pairwise testing may not be effective.

You have probable combinations which you want to focus on.

If you have certain combinations of inputs that have far more commonality or importance you would want to focus your testing data around these. Pairwise testing ignores any probable importance.

You don’t know how the input variables interact

Pairwise testing assumes each input value carries the same significance on the outcome. If certain inputs have greater influence over the outcome you would want to focus your test data around these.

Your data requires higher order test combinations

Not all data sets follow the distribution of most faults being caused by at most 2 factors. You may have a larger number of faults outside the 2 parameter range. Using higher order test data combinations (Such as 3-wise, 4-wise etc) or completely different test methods may be required to reduce the risk of missing faults.

The real difficulty here is knowing beforehand the distribution of faults.

You want a best practice

There are no testing “best practices” that you can simply “follow” in order to achieve success.

To quote James Marcus Bach:

“blindly applying Pairwise testing to combinatorial testing problems may increase the risk of delivering faulty software.”http://www.testingeducation.org/wtst5/PairwisePNSQC2004.pdf

Your faults mean people are going to die

In realtime or safety critical systems any faults irrelevant of the number of interacting parameters wants to be detected.

Final words on Pairwise

Shortcuts to Combinatorial problems in testing carry their pitfalls. No matter how you reduce the input permutations you should spend time understanding your combinatorial data set. This is especially the case with Pairwise testing which does not consider the relationships the inputs have on the outputs. Its up to you to examine any interactions or strong influences in order to evaluate Pairwise’s suitability. If Pairwise testing is used thoughtfully with due considerations to its limitations and pitfalls it can provide a powerful tool to testers and Cucumberists alike.

Pairwise tools

Further Reading

Metrics for Plain Text Acceptance Tests

There has been lots of activity around the value of metrics for source code and tests. In the Ruby world tools like metric_fu provide a wealth of analysis.

While working on my Cucumber talk for Rails Underground I started investigating how we could apply metrics to the customer focused plain text of Cucumber. For those not familiar with Cucumber it’s an acceptance testing framework which allows non-technical people to write plain-text describing the behaviors of their system. The developers/testers map the plain-text to tests.

Having spent time teaching people about the plain-text side of Cucumber I often found myself recommending the same guidelines and plain-text anti-patterns. This lead me to think about providing metrics scoring the customers plain-text.

Why would we want plain-text acceptance test metrics?

  • Help plain-text beginners avoid bad practices early on.

  • Help improve the quality of plain-text

  • Help quality review with a large frequency of incoming features

Why does the quality of the plain-text matter?

Why focus on quality, the plain-texts primarily goal is to be easy for customer to use?

  • The developer builds the Domain specific language via mapping plain text to ruby. Higher quality plain-text could make it easier to manage these mappings without any major impact to readability.

  • Higher quality text is easier to read, edit and understand.

Who would find it useful?

Initially Developers.

  • In some scenarios the developers write the features from discussions and give to the customer to review.

  • Developers may tweak/review customer written changes/features.

  • Developers often edit/tweak plain-text from the customer to enable reuse of existing test code .

  • In open source projects often developers write Cucumber features. Metrics are something they are comfortable with.

Can you measure quality in plain-text?

First its important to distinguish acceptance tests from pure plain-text. Within acceptance tests we have some degree of structure, for example using Given/When/Then to describe scenarios.

Cucumber Example:

Scenario: Eating all cucumbers
  Given there are 5 cucumbers
  When I eat 5 cucumbers
  Then I should have 0 cucumbers

This structure reduces the complexity of analysing the quality of the text. It provides us with different structural elements which have different rules/guidelines on what their content should be.

The problem with measuring the quality of text is that it is far more subjective in than in code. So while we cannot be absolute in our assessment of quality we can try and codify smells that could indicate areas in the text that could be improved. This is pretty much true for all metrics, they are guidelines not absolutes (Dan Norths highlights the dangers of absolute metrics in the Parable of Metrics)

So what useful metrics could we look at?

Plain text Metrics

From my experience with Cucumber I would suggest examining:

Feedback

What do you think of the idea?

Can you think of any other useful plain-text metrics?

Ruby Metric-fu Hudson Plugin

I have written a plugin for the continuous integration server Hudson which uses a metric-fu rake task at its core to build and present graphs representing different metrics over successful builds. It currently supports:

Hudson with RubyMetricFu graphs

The source is available on Github:

http://github.com/josephwilk/rubymetricfu

Installing

Currently all Hudson’s plugins are stored in something called SVN. Being more of a GIT myself you have to manually install the plugin rather than using the automatic Hudon GUI install method.

Install steps:

  1. Ensure you have the Ruby and Rake Hudson plugins installed.

  2. Follow the metric-fu installation guide (http://metric-fu.rubyforge.org/)

  3. Ensure project code has a metrics:all rake task (auto added when you require metric-fu)

  4. Download the rubymetricfu.hpi plugin file

  5. Copy the file into your plugins folder within you Hudson install. Hudson’s default is ~/.hudson/plugins

  6. Restart hudson

  7. Go to the ‘configure’ link for a project and select the Ruby Metric-fu report option (see below)

  8. (Optionally) Pick which Rake version you want to use.

Setup Metric-fu

Future

  • Futher metrics to graph:

  • Configure which metrics you want on your project page.

  • Create a Crap4R meter using Rcov and Flog (Similar to Crap4J).

  • Better integration with the html reports generated by metric-fu.

JVM Call to Arms With Cucumbers

foaming_cuke_thumb “Cucumber needs you to experiment with your favourite Java Virtual Machine based language and connect to Cucumber via JRuby.”

What’s this Cucumber you speak of? Checkout: http://cukes.info/

Wait that’s Ruby, how do I use a JVM based language to play with it?

Cuke4Duke (http://wiki.github.com/aslakhellesoy/cuke4duke) allows writing Cucumber step definitions in Java. This means Java developers can use the Cucumber tool without having to write any Ruby.

Ruby step definitions

1
2
3
4
Given /I have (\d+) cukes in my belly/ do |n|
  @belly ||= []
  n.to_i.times {|i| @belly << "cuke"}
end

Equivalent Java step definitions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package cukes;

import cuke4duke.Given;
import cuke4duke.Steps;
import java.util.List;
import java.util.ArrayList;

@Steps
public class BellySteps {
    private List belly = new ArrayList();

    @Given("I have (\\d+) cukes in my belly")
    public void bellyCukes(int cukes) {
        for(int i = 0; i < cukes; i++) {
            belly.add("cukes");
        }
    }
}

This works by connecting Java to Cucumber via JRuby (http://jruby.codehaus.org/).

So in theory if your language runs on the JVM you can use JRuby, and hence use Cucumber and its wonderful Gherkin language.

So what are you waiting for! Pick up your favourite JVM language and arm yourself with Cucumber!

Some Example JVM languages:

Good luck and safe Cuking

FutureRuby Talk: Cucumbered

FutureRuby was an exceptional conference and I was excited to be a part of such a creative group of people. I talked about Cucumber, looking at what it is, how to use it, and why to use it. Useful links for Cucumber:

I demonstrated using Cucumber to test a simple IPhone application. To make the IPhone testing a little more palatable I used a little gem called IRobat. Its very rough around the edges and in no way complete but you can take a look at the IRobat code on Github.

No Cucumbers were harmed (just mildly shaken up) in the making of this presentation.

Cucumbered

View more documents from Joseph Wilk.

What people were saying about the talk:

FutureRuby Cucumber Twitter Talk

FutureRuby Cucumber twitter talk

Cucumber, Tags and Continuous Integration Oh My!

We want to be able to commit our code frequently to prevent merge headaches.

the longer you wait, the more your code will diverge from your teammates. If you don’t commit often you rob them of the opportunity to reduce merge hell.” Aslak Hellesøy

When dealing with Cucumber and Features/Scenarios we may find we want to commit part way through a scenario but we won’t because:

  1. We don’t want to break the build
  2. We don’t want to pollute the build with lots of pending steps.

A common solution to this problem is to create two streams for running the features:

  1. In-progress

    • If an in-progress scenario fails then the build carries on.
    • If an in-progress scenario passes then the build fails (This is very similar to how Rspec works with pending)
  2. Finished

    • If a completed scenario fails it causes the build to fail.

We can implement this model using Cucumber’s new Tag feature. We can tag Scenarios and Features with @in-progress and use this tag to help exclude in-progress Features/Scenarios from the finished build.

@in-progress
Feature:
  In order to avoid merge headaches
  As a developer
  I want to tag my features and scenarios with a in-progress tag

  @in-progress
  Scenario: I'm not finished yet
    Given ...
    When ...
    Then ...

The Rake tasks

Finished features/scenarios task

We prefix tags with ~ to exclude features or scenarios having that tag

1
2
3
4
  desc "Run finished features"
  Cucumber::Rake::Task.new(:finished) do |t|
    t.cucumber_opts = "--format progress --tags ~in-progress"
  end

In-progress features/scenarios task

1
2
3
4
  desc "Run in-progress features"
  Cucumber::Rake::Task.new(:in_progress) do |t|
    t.cucumber_opts = "--require formatters/ --format Cucumber::Formatter::InProgress --tags in-progress"
  end

We require a special formatter Cucumber::Formatter::InProgress which is essential for making the task work. This formatter as well as giving helpful output changes the command line exit codes of Cucumber. This is kind of crazy but only within the formatter do we have enough information to decided if we should fail or pass. The formatter only returns a failure exit code if there were any scenarios which passed. So unlike the default exit codes failing steps will not cause a failure exit code.

Full Source Code

Also available at: http://github.com/josephwilk/cucumber_cocktails/tree/master

Rake Task

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
require 'cucumber/rake/task'

class BuildFailure < Exception;
  def initialize(message = nil)
    message ||= "Build failed"
    super(message)
  end
end;

Cucumber::Rake::Task.new do |t|
  t.cucumber_opts = "--format progress"
end

namespace :features do
  desc "Run finished features"
  Cucumber::Rake::Task.new(:finished) do |t|
    t.cucumber_opts = "--format progress --tags ~in-progress"
  end

  desc "Run in-progress features"
  Cucumber::Rake::Task.new(:in_progress) do |t|
    t.cucumber_opts = "--require formatters/ --format Cucumber::Formatter::InProgress --tags in-progress"
  end
end

desc "Run complete feature build"
task :cruise do
  finished_successful = run_and_check_for_exception("finished")
  in_progress_successful = run_and_check_for_exception("in_progress")

  unless finished_successful && in_progress_successful
    puts
    puts("Finished features had failing steps") unless finished_successful
    puts("In-progress Scenario/s passed when they should fail or be pending") unless in_progress_successful
    puts
    raise BuildFailure
  end
end

def run_and_check_for_exception(task_name)
  puts "*** Running #{task_name} features ***"
  begin
    Rake::Task["features:#{task_name}"].invoke
  rescue Exception => e
    return false
  end
  true
end

InProgress Formatter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
module Cucumber
  module Formatter
    class InProgress < Progress
      FAILURE_CODE = 1
      SUCCESS_CODE = 0

      FORMATS[:invalid_pass] = Proc.new{ |string| ::Term::ANSIColor.blue(string) }

      def initialize(step_mother, io, options)
        super(step_mother, io, options)
        @scenario_passed = true
        @passing_scenarios = []
        @feature_element_count = 0
      end

      def visit_feature_element(feature_element)
        super

        @passing_scenarios << feature_element if @scenario_passed
        @scenario_passed = true
        @feature_element_count += 1

        @io.flush
      end

      def visit_exception(exception, status)
        @scenario_passed = false
        super
      end

      private

      def print_summary
        unless @passing_scenarios.empty?
          @io.puts format_string("(::) Scenarios passing which should be failing or pending (::)", :invalid_pass)
          @io.puts
          @passing_scenarios.each do |element|
            @io.puts(format_string(element.backtrace_line, :invalid_pass))
          end
          @io.puts
        end
        print_counts

        unless @passing_scenarios.empty?
          override_exit_code(FAILURE_CODE)
        else
          override_exit_code(SUCCESS_CODE)
        end
      end

      def override_exit_code(status_code)
        at_exit do
          Kernel.exit(status_code)
        end
      end

    end
  end
end

Outside-in Development With Cucumber and Rspec

I was speaking in Edinburgh at Scotland on Rails 2009 about Cucumber and Rspec.

You can watch the recorded full talk.

I’ve also posted the slides from the presentation and uploaded the screencasts used in the presentation in both high and low resolutions. They are accessible from links within the presentation.

Here are some of the useful links from the presentation:

I would like to thank the organisers, everyone who came to listen and speak at the conference. It was a pleasure to be a part of such an enthusiastic group of people in such a beautiful venue.

Cucumber Waves Goodbye to GivenScenario

Cucumber as of version 0.2 has removed the GivenScenario feature. GivenScenario was introduced in the original story runner to allow calling a scenario from another within the same feature.

Scenario: setup
  Given ...

 Scenario: example
   GivenScenario setup
  ...

I initially thought this was a great feature (as mentioned in Rspec stories from the trenches). However when I was using GivenScenario I was thinking as a programmer, which is usually where you start going off the tracks when writing features.

Lets look at an example of GivenScenario

 Scenario: setup
    Given an author "joe"
    And pages created by "joe":
      | title |    content   |
      | cuke  | just cuke it |

  Scenario: search pages
    GivenScenario setup
    When I search for "cuke"
    Then I should see "just cuke it" 

  Scenario: expanding search result details
    GivenScenario search pages
    When I click "show details" for the first search result
    Then I will see the author "joe"

This example highlights a number of problems with GivenScenario:

Difficult to understand isolated scenarios

While the example might make sense when working sequential through the scenarios in the longer term it produces scenarios that are hard to maintain and understand. Often during the life cycle of a system you will come back to an isolated scenario but in this example you would have to trace through the stack of scenarios to understand the context.

Creating setups which pretend to be scenarios

When creating a Scenario to use in GivenScenario often what you are actually expressing is a setup block and not a true scenario.

Noise in the output

If you re-use scenarios that contain the normal Given/When/Then it introduces a lot of noise in the output of the features. This is especially the case with multiple ‘Then’s each testing different concepts. Running our example above outputs:

  ...
  Scenario: expanding search result details
    Given an author "joe"
    And pages created by "joe"
      | title |    content   |
      | cuke  | just cuke it |
    When I search for "cuke"
    Then I should see "just cuke it"
    When I click "show details" for the first search result
    Then I will see the author "joe"
  ...

Ugly Camel case

Since the reserved word Given was already used we ended up with GivenScenario. Something that was not natural for non-technical users.

Summary

GivenScenarios are like procedure calls but from this analogy they ended up with something that had too much power. This lead to a feature which was easy to misuse and did not cleanly fit with Scenarios and plain text.

Solutions to GivenScenario

There are two solutions to replace ‘GivenScenario’:

Calling Steps from Step Definitions

The ability to call existing steps from step definitions allows us to introduce a hierarchy of abstraction in our steps.

We can have a high level Step:

  Given a basic site  

Which in turn uses our other steps.

 Given /a basic site/ do
    Given 'an author "joe"'
    Given 'pages created by "joe"', table(%{
      | title |    content   |
      | cuke  | just cuke it |
    })
  end

In contrast to ‘GivenScenario’ this solution pushes the logic down into the step definitions, moving it out of the customers domain and into the developers domain.

This reduces the noise in the feature text allowing us to focus on the real feature and value.

However abstraction is not always the right pattern when trying to add context to your features. There is a danger that high-level steps could lead to confusion, since it pushes details out of sight of the customer.

Background

Background defines of a set of steps which are implicitly run before every scenario.

This is similar to ‘GivenScenario’ but has some key differences which help overcome some of the problems mentioned:

  • Only one Background per feature - No chain of background dependencies are possible

  • Background has to be the first feature element in the Feature - We always know where to look for context.

  • It’s implicitly called in every scenario. - There is no extra noise in the scenarios.

  • It’s not shown in the output of each scenario (unless there are errors in the background) - There is no extra noise in the output.

Example:

 Background:
    Given an author "joe"
    And pages created by "joe"
      | title |    content   |
      | cuke  | just cuke it |

  Scenario: search pages
    When I search for "cuke"
    Then I should see "just cuke it" 

  Scenario: expanding search result details
    When I search for "cuke"
    And I click "show details" for the first search result
    Then I will see the author 'joe'