Blog Posts tagged with: testing

Nov 26 2013


Better Than Unit Tests

Don't get me wrong, I love unit testing. The practice of unit testing is probably the most important quality innovation in my whole career. Unit testing has spread beyond the agile development community where it started into the mainstream, and we are all better off for it.

We need to be aware of some serious limitations, though. For example, in "Out of the Tar Pit", Moseley and Marks say:

The key problem with testing is that a test (of any kind) that uses one particular set of inputs tells you nothing at all about the behaviour of the system or component when it is given a different set of inputs. The huge number of different possible inputs usually rules out the possibility of testing them all, hence the unavoidable concern with testing will always be, "have you performed the right tests?" The only certain answer you will ever get to this question is an answer in the negative --- when the system breaks.

We can only write unit tests with a certain number of input cases. Too few, and you miss an important edge case. Too many, and the cost of maintaining the tests themselves becomes onerous.

Worse yet, we know that unit tests are inadequate when we need to test overall system properties, in the presence of GUIs, and when concurrency is involved.

So here are four testing strategies that each supplement unit tests with more ways to gain confidence in your fully assembled system.

Automated Contract Testing

Automated Contract Testing uses a data-oriented specification of a service to help with two key tasks:

  • Exercise the service and verify that it adheres to its invariants.
  • Simulate the service for development purposes.

It looks like this:

Automated Contract Testing

Some things to consider when using this type of testing:

  • The contract model should be written by a consumer to express only the parts of the service interface they care about. (If you overspecify by modeling things you don't actually use, then your tests will throw false negatives.)
  • The supplier should not write the contract. Consumers write models that express their desired interface partly to help validate their understanding of the protocol. They may also uncover cases that are in the supplier's blind spot.
  • The test double should not make any assumptions about the logical consistency of the results with respect to the parameters. You should only be testing the application code and the way it deals with the protocol.
  • E.g., if you are testing an "add to cart" interface, do not verify that the item you requested to add was the one actually added. That is coupling to the implementation logic of the back end service. Instead, simply verify that the service accepts a well-formed request and returns a well-formed result.

A nice library to help with this style of test is Janus.

Property-based Testing

Property-based testing is a derivative of the "formal specifications" ideas. It uses a model of the system to describe the allowed inputs, outputs, and state transitions. Then it randomly (but repeatably) generates a vast number of test cases to exercise the system. Instead of looking for success, property-based testing looks for failures. It detects states and values that could not have been produced according to the laws of the model, and flags those cases as failures.

Property-based testing looks like this:

Property-based testing with Quickcheck

The canonical property-based testing tool is Quickcheck. Many partial and fragmentary open-source tools claim to be "quickcheck clones" but they lack two really important parts: search-space optimization and failure minimization. Search-space optimization uses features of the model to probe the "most important" cases rather than using a sheer brute-force approach. Failure minimization is an important technique for making the failures useful. Upon finding a failing case, minimization kicks in and searches for the simplest case that recreates the failure. Without it, understanding a failure case is just about as hard as debugging an end-user defect report.

Considerations when using property-based testing:

  • Specifying the model is a specialized coding task. The model is often 10% of the size of the original system, but can still be very large.
  • Test failures sometimes indicate a problem in the system and sometimes a problem in the model. Most business systems are not very well specified (in the rigorous CS sense of the term) and suffer from many edge cases. Even formal standards from international committees can be riddled with ambiguities and errors.
  • If the system under test is not self-contained, then non-repeatable test failures can confuse the framework. Test isolation (e.g., mocks for all integration points) is really essential.
  • This is not a cheap approach. Developing a robust model can take many months. If a company commits to this approach, then the model itself becomes a major investment. (In some ways, the model will be more of an asset than the code base, since it specifies the system behavior independently of any implementation technology!)

Fault Injection

Fault Injection is pretty much what it sounds like. You run the system under test in a controlled environment, then force "bad things" to happen. These days, "bad things" mostly means network problems and hacking attacks. I'll focus on the network problems for now.

One particular fault injection tool that has some interesting results lately is Jepsen. Jepsen's author, Kyle Kingsbury, has been able to demonstrate data loss in all of the current crop of eventually consistent NoSQL databases. You can clone that repo to duplicate his results.

It looks like this:

Fault Injection With Jepsen

Jepsen itself runs a bunch of VMs, then generates load. While the load is running against the system, Jepsen can introduce partitions and delays into the virtual network interfaces. By introducing controlled faults and delays in the network, Jepsen lets us try out conditions that can happen "in the wild" and see how the system behaves.

After running the test scenario, we use a validator to detect incorrect results.


  • Jepsen itself doesn't provide much help for validation. As delivered, it just tries to store a monotonic sequence of integers. For application-specific tests, you must write a separate validator.
  • Generating load needs to be predictable enough to verify the data and messages out from the system. That either means scripts or pseudo-random cases with a controlled seed.
  • This is another test method that can't prove success, but can detect failures.

Simulation Testing

Simulation testing is the most repeatable of these methods. In simulation testing, we use a traffic model to generate a large volume of plausible "actions" for the system. Instead of just running those actions, though, we store them in a database.

The activity model is typically a small number of parameters to describe things like distribution of user types, ratio of logged-in to not-logged-in users, likelihood of new registrations, and so on. We use these parameters to create a database of actions to be executed later.

The event stream database will be reused for many different test runs, so we want to keep track of which version of the model and event generator were used to create it. This will be a recurring pattern with simulation testing: we always know the provenance of the data.

Simulation Testing with Simulant

The simulation runner then executes the actions against the system under test. The system under test must be initialized with a known, versioned data set. (We'll also record the version of starting dataset that was used.) Because this runner is a separate program, we can turn a dial to control how fast the simulation runs. We can go from real time, to double speed, to one-tenth speed.

Where most test methods would verify the system output immediately, simulation testing actually just captures everything in yet another database. This database of outputs includes the final dataset when the event stream was completed, plus all the outputs generated by the system during the simulation. (Bear in mind that this "database" could just be log files.) These are the normal outputs of the system, just captured permanently.

Like everything else here, the output database is versioned. All the parameters and versions are recorded in the test record. That way we can tie a test report to exactly the inputs that created it.

Speaking of the test report, we run validations against the resulting dataset as a final process. Validations may be boolean checks for correctness (is the resulting value what was expected?) but they may also verify global properties. Did the system process all inputs within the allowed time? Were all the inputs accepted and acted on? Does "money in" balance with "money out"? These global properties cannot be validated during the simulation run.

An interesting feature of this model is that validations don't all need to exist when you run the simulation. We can think of new things to check long after the test run. For example, on finding a bug, we can create an probe for that bug and then find out how far back it goes.

Simulant is a tool for building simulation tests.


  • This approach has some similarities to property-based testing. There are two major differences:
    • Property-based testing has a complete formal model of the system used for both case generation and invariant checking. In contrast, simulation testing uses a much simpler model of incoming traffic and a separate set of invariants. Splitting these two things makes each one simpler.
    • Property-based testing runs its assertions immediately. To add new assertions, you must re-run the whole test.
  • Because assertions are run after the fact, you can verify global properties about the system. For example, does "money in" balance with "money out"?
  • Using virtual machines and storage for the system under test makes it much simpler to initialize with a known good state.
  • We can create multiple event streams to represent different "weather conditions." E.g., a product launch scenario versus just a heavy shopping day.
  • The event stream doesn't need to be overly realistic. It's often better to have a naive transition model informed by a handful of useful parameters. Generate lots of data volume instead of finely modeled scenarios.
  • When there are failures, they are usually very obvious from the assertions. The tougher part is figuring out how a bad final state happened. More output in the test record is better.
  • If you squint, this looks like a load testing model. While it's possible to use this for load testing, it can be awkward. We've found a hybrid approach to be better: run a load generator for the "bulk traffic," then use simulation testing to check correctness under high load conditions. Of course, you sacrifice repeatability this way.

May 12 2009


Blue Ridge 1.0: JavaScript Unit Testing for Rails. Scandalous!

You wouldn’t consider developing a Rails application without having a solid test suite for your Ruby code, but you’ve somehow convinced yourself to cross your fingers and look the other way when it comes to JavaScript. It doesn’t have to be that way.

Meet Blue Ridge, a Rails plugin that brings the goodness of test-driven and behavior-driven development to your unobtrusive JavaScript code in a Rails-friendly manner.

Blue Ridge

Historically, when selecting a JavaScript testing solution, you were forced to choose whether you wanted a framework that could run your tests in the browser or one that could only run your tests in a headless fashion. By providing a friendly convention-over-configuration wrapper around a collection of open source tools, Blue Ridge gives us the best of both worlds: fast, automation-friendly, and headless testing plus the ability to run your tests in whichever browser is acting up on any given day.

Last summer, Blue Ridge was just a twinkle in our eye. Last week, Blue Ridge made its public debut at RailsConf. Today, we’re pleased to announce version 1.0.

Getting Started

First, install Blue Ridge into your Rails app:

  ./script/plugin install git://
  ./script/generate blue_ridge

Blue Ridge creates a small example spec to get you started. Run your JavaScript specs to make sure that all is well so far:

  rake test:javascripts

(Hint: You can also use the `spec:javascripts` or `examples:javascripts` aliases. Blue Ridge is compatible with your testing framework of choice, be it test/unit, RSpec, Micronaut, Shoulda, test-spec, etc.).

Next, try running an individual spec. When you installed Blue Ridge, it created a spec file named “application_spec.js”. You can run it like so:

  rake test:javascripts TEST=application

Of course, you really want to generate a spec that’s specific to your app. Let’s say you want to write some tests for a JavaScript file called “public/javascripts/graphics.js”. Run:

  ./script/generate javascript_spec graphics
  rake test:javascripts TEST=graphics

Cool. We can run our JavaScript specs from the command line, and they’re fast! But you just got a bug report regarding JavaScript errors in IE6 (die already!), and you want to write some tests to prove (and then resolve) the bug. To run your spec inside a web browser, simply load the HTML fixture associated with the spec (e.g., “test/javascripts/fixtures/graphics.html” for the tests in “graphics_spec.js”) and see the results of running the tests in that specific browser.

Check out the README for more information on the directory layout and a detailed description of the various Blue Ridge components.


Blue Ridge wouldn’t quite fit into the Rails ecosystem if it didn’t come equipped with a few opinions. So by default, it assumes you’re using jQuery. “application_spec.js”, which was generated when you installed the plugin, includes an example of calling jQuery functions inside a test.


  Screw.Unit(function() {
    describe("Your application javascript", function() {
      it("provides cliché example", function() {
        expect("hello").to(equal, "hello");

      it("accesses the DOM from fixtures/application.html", function() {
        expect($('.select_me').length).to(equal, 2);

And, no, we don’t actually encourage you to write tests for standard libraries like JQuery and Prototype; it just makes for an easy demo.


If you prefer Prototype, no problem: “Have it your way.”


  require("../../public/javascripts/prototype.js", {onload: function() {

  Screw.Unit(function() {
      describe("Your application javascript", function() {
          it("accesses the DOM from fixtures/application.html", function() {
              expect($$('.select_me').length).to(equal, 2);

Put jQuery into “no conflict” mode to give the `$` function back to Prototype, require `prototype.js`, and chain any files that are dependent on `prototype.js` in the `onload` callback. Done. Your Prototype-based tests await you.

Act Now, and We’ll Throw in Mocking Support

Blue Ridge includes Smoke, a JavaScript mocking and stubbing toolkit somewhat similar to Mocha. Assume we’re testing a function called `calculateTotalCost`, and you want to ensure that it calls `calculateComponentPrice` for each given component. That test might look something like so:

  it("calculates the total cost of a contract by adding the prices of each component", function() {
    var componentX = {}, componentY = {};
      .with_arguments(componentX).exactly(1, "time").and_return(42);
      .with_arguments(componentY).exactly(1, "time").and_return(24);
    expect(SalesContract.calculateTotalCost([componentX, componentY])).to(equal, 66);

Fun with TextMate

Running individual specs from the command line is an essential feature, but if you use TextMate like we do, you’d rather not have to leave your editor in order to run a spec. If this describes your development style, take the Blue Ridge TextMate Bundle for a spin.

  cd ~/Library/Application Support/TextMate/Bundles/
  git clone git:// Blue\ Ridge.tmbundle

Once you reload your bundles (or restart TextMate), just hit Command-R to run the currently-open spec directly from TextMate.

And be sure to check out the snippets as well. Type `it`, `des`, `bef`, or `aft`, and then press the tab key to expand into full `it` blocks, `describe` blocks, etc.

But Wait, There’s More

Want more examples? To see Blue Ridge in action inside a working Rails app, check out the Blue Ridge sample application. Among other things, the sample app includes examples of:

  • using nested `describe` functions
  • setting up per-spec HTML “fixtures”
  • stubbing and mocking functions
  • running the Blue Ridge specs as part of your default Rake task

Money Back Guarantee

Blue Ridge is guaranteed to be bug free and fulfill your every need, or your money back! Of course, in lieu of a full refund, you’re welcome to hop over to the project’s GitHub issue tracker to let us know about any issues or ideas for possible improvements. Even better, fork the repo and start hacking! If you have patches, send us pull requests.

Now, go forth and give your JavaScript code the testing love it deserves!

Apr 30 2009


JavaScript Testing at RailsConf

Jason and I will be at RailsConf next week speaking about Blue-Ridge, our Rails plugin that makes test-driven JavaScript development easy and natural!

Here's our abstract: > Learn how to enjoy the benefits of test-driven development beyond just your Ruby on Rails code.  JavaScript is code too, and it deserves tests! With the help of some handy plugins, Rails lets you test your unobtrusive JavaScript using tools such as Screw.Unit and Smoke. The tools and approach are library-agnostic; they work well with jQuery, Prototype, and others.

The talk is at 1:50 PM on Tuesday, May 5. Come check it out, and give your JavaScript code the testing love it deserves.

Apr 01 2009


Micronaut: Innovation Under the Hood

Last week, Rob announced Micronaut, our new BDD framework written by Chad Humphries. I’ve been itching to write about Micronaut as well, and my take on it is similar to Rob’s, although I start with a very different perspective.

It’s a little surprising that I’m excited about Micronaut. I’m sort of a Ruby old-timer, and I’ve never been that excited about RSpec or any of the other Ruby test frameworks that have appeared over the last few years. Yes, they offer some advantages in some cases, but I’ve always seen the improvements as incremental rather than revolutionary. I like RSpec’s facilities for structuring tests, but should never struck me as a radical improvement in expressiveness. In fact, I frequently run into situations where I think a well-crafted assertion is much clearer than a should expression.

I just never got the BDD religion, in other words.

What I like about Micronaut is that its innovation is mostly under the hood, rather than on the surface. Rather than designing some new variation on how we express our tests, Chad opted for RSpec compatibility and built a really nice engine for loading and running RSpec-style tests. Architecture matters, and Micronaut’s architecture is a joy.

First of all, as Rob noted, Micronaut is small, easy to understand, and fast. That makes it fun to hack on, and great to use on projects. It also gives me confidence; simple tools tend to be more robust, and I want that in my testing tool.

Second, Micronaut’s architecture provides a lot of flexibility. To illustrate that, while attending talks at QCon a couple weeks ago, I added test/unit compatibility to Micronaut. Here’s a short example that mirrors Rob’s example from last week, but in a test/unit style:

  require 'micronaut/unit'

  Micronaut.configure do |config|
    config.filter_run :focused => true

  class MicronautMetadataTest < Micronaut::Unit::TestCase

    # 'testinfo' allows declaring metadata for a test, and works
    # like Rake's 'desc' method: it applies to the next test.

    testinfo :focused => true
    def test_this_will_definitely_run    
      assert self.running_example.metadata[:focused]

    def test_this_never_runs
      flunk "shouldn't run while other specs are focused"

    testinfo :pending => true
    def test_this_is_pending

    testinfo :speed => 'slow'
    def test_too_slow_to_run_all_the_time
      assert_equal 4, 2+2

I don’t know how interesting test/unit compatibility is in its own right. I certainly wouldn’t claim that the example above is superior to the RSpec equivalent. But it was a fun exercise to really prove Micronaut’s design. It only took a few hours (of partial attention) and it clocks in at under 200 lines of code (plus `assertions.rb` from the test/unit codebase). I love the idea of the testing API and the test execution engine not being so tightly coupled. And it might be very useful if you have existing tests written with test/unit and you’d like to gain the benefit of Micronaut’s metadata support.

Which brings me to Micronaut’s best feature. Almost all unit-testing frameworks incorporate some support for test suites, but suites have never lived up to their promise. It would be nice to have tests organized in suites for distinct purposes, but the setup and maintenance costs associated with suites have meant that very few teams make good use of them. Micronaut’s metadata is a different—and much better—solution to the same problem. As some have noted, the idea is similar to Cucumber’s support for tags. Just attach metadata of your choice to Micronaut tests or examples, and then you can use that metadata to select what tests are run in different situations.

At the moment, test/unit compatibility is only available in my fork of Micronaut. It allows using RSpec-style describe and it blocks alongside test/unit-style test methods, and supports test blocks à la Context and ActiveSupport. I don’t think we’ll pull this completely into Micronaut; I think it would work better as a separate gem. But I’d love to get your feedback about the whole idea.

Jul 31 2008


Fully Headless JSSpec

Update 2009-05-12 - With the release of Blue Ridge 1.0, fully headless JavaScript testing just got a whole lot easier! Check out the plugin for the current state-of-the-art approach to unit testing JavaScript in your Rails projects.

I have been frustrated at the state of testing for JavaScript. There are a bevy of great (and not-so-great) choices for testing your JavaScript, but nothing that is current, effective, and automated. We used Crosscheck for a while, but the project went stagnant and was awfully heavy-weight besides. On top of which, it didn't offer a BDD-flavored spec syntax.

I've been a fan of JSSpec for a while, but its "runs tests only in the browser" style means that automated server-side, continuous testing was nigh upon impossible.

Then, along came John Resig and his marathon weekend of hacking to mock out the DOM API in a relatively simple JavaScript file and voila! Headless JSSpec testing for Rails apps could become a reality.

If you are interested in trying it out, first download and install Rhino 1.7R1. Make sure you put the js.jar file on a globally accessible path somewhere. Then, get a copy of my javascript_testing repo at Github ( It is a minimal repo, with just enough to get you started. The files in the repo are:

  • env.js -- a clone of John Resig's browser environment file, with a couple of minor tweaks for support
  • test_jsspec.rake -- a couple of rake tasks for running the tests, and for integrating them with CruiseControl (will add support for other CI solutions as I can)
  • jsspec/jsspec.js -- an unmodified copy of the jsspec trunk, here for your convenience
  • jsspec/config.js -- a series of, essentially, global includes for your specs. Edit to suit your own environment
  • jsspec/consoleReportForRake.js -- an extension to JSSpec that adds an afterAll hook for reporting errors that blow up your Rake task appropriately (for continuous integration purposes)
  • jsspec/prototypeForJsspec.js -- a minor tweak to Prototype to get it working with the env.js file

To get rolling, copy everything but the rake task into RAILS_ROOT/test/javascripts. Put the rake task in RAILS_ROOT/lib. Edit jsspec/config.js to load any standard JavaScript files your project uses. Then, start writing specs. Here's an example, RAILS_ROOT/test/javascripts/spec_string.js:

with(Spec) {  
   describe("Prototype's String.stripTags", function() { with(this) {  

     it("should remove open and close tags", function() {  

     it("should remove self-closing tags", function() {

}; = "ConsoleReport";;

To run this, use the targeted version of the rake task:

:trunk:-> rake test:jsspec TEST=spec_string.js
(in /Users/jgehtland/RELEVANCE/project/trunk)

2 passed. 

Run 2 examples in 0.001seconds.


(Simply running rake test:jsspec will run any file in your test/javascripts folder whose name begins with "spec" and report the results.)

We don't actually encourage you to write specs and tests for standard libraries like Prototype, JQuery, etc. It just makes for an easy demo.

If you want to test something that requires some HTML to bind against, I generally create a folder called test/javascripts/fixtures, and load HTML exemplars in there. For example, to test Prototype's $$ selector, you might create a fixture called fixtures/selector.html that contains the following:

  <div class="select_me"/>
  <span class="select_me"/>
  <div class="dont_select_me"/>

Your test, spec_selector.js, would now contain:

window.location = "fixtures/selector.html";

with(Spec) {  
   describe("Prototype's $$ selector", function() { with(this) {  

     it("should find only elements with the provided class selector", function() {  

     it("should find only elements with the provided tag", function() {

}; = "ConsoleReport";;

If you want these tests to be part of your regular CI process, just uncomment the second task in test_jsspec.rake and you can now have CI for your JS, ASAP.

Popular Tags