BDD is Like Jazz

One of the most famous jazz records out there is called ‘Kind of Blue’ by Miles Davis, which you might have come across. We’ll get back to why we’re using a jazz record to describe Behaviour-Driven Development (BDD) later, but here are the songs on the album.


At CukeUp! London 2015, there was a panel called ‘WTF is BDD’?. If you aren’t familiar with BDD is before watching the recording, you may be even more confused after watching it. The panel of BDD luminaries couldn’t agree on a single thing. They certainly couldn’t come up with any concrete examples of what BDD is. Instead, it’s a disappointing discussion about how it’s important not to define BDD.

Before we try to define BDD, let’s take a quick tour through the history of software development. The purpose of software development is to deliver solutions to business problems, and there’s a continuing challenge to verify that the software delivered actually satisfies the requirements. Waterfall methods had slow feedback cycles built into them, allowing projects to go seriously off track. In response, the industry began to experiment with lightweight methodologies like XP and Scrum, which incorporate short iterations for faster feedback. These went on to be given a collective name agile.

Agile approaches don’t magically fix all the problems experienced by organizations that develop software. The persistent problem is that iterations are often mis-implemented as mini-waterfalls. Teams spend weeks implementing a piece of functionality. After that comes testing to make sure they got it right. When they discover a mistake, they have to fix the problem and retest the story — a tedious and time-consuming process. The resulting delay in feedback continues to limit the benefits that agile development methods can deliver.

Test-Driven Development (TDD) is a technical practice that helps speed up the feedback loop by demanding that developers write automated tests before they write the code. The actual purpose of the tests is to define the behavior of the application, but because the tests themselves are code, they become detached from the business requirement as soon as they’re written. And as time goes by, this gap gets bigger and bigger. As we’ll see, BDD builds on TDD, while preserving a strong link between the business requirements and the technical solution.

BDD is valuable because it’s a set of principles and practices that keep evolving; we borrow from other methodologies, and new people who come and go teach us new things. So, it’s important not to try to nail down exactly what it is. Unfortunately, this sort of imprecision is a problem for people who are new to the techniques and just want to learn how to get value from BDD.

Chris Matts said one really smart thing on that panel. He said, “BDD is like jazz” – so you can’t fully describe what it is, but you can cut a vinyl of what it is today, for this particular band or that particular group. So in this article, you’ll get a whirlwind tour of what BDD is like from the perspective of Cucumber Limited (the company behind the open-source tool, Cucumber). Currently, this is how we practice BDD, and how we teach BDD when we work with organizations who are adopting BDD.

First of all, BDD is not like an orchestra where you have a conductor, a project manager or a product owner who instructs everyone on what to do. Many software projects are like that. You have this ‘conductor’ who writes down all the requirements and just sends them to the software development team, and they try to make sense out of it. That’s not how BDD works. It’s also not something that you can do alone; it’s not a solitary activity. You can’t just download Cucumber and start doing BDD. Well, maybe you can if you’re experienced with it, but if you’re new to it, certainly that is not going to get you very far.
BDD is much more of a jamming, jazzy kind of thing, where you have people constantly improvising, following some rules. There are some rules in there, but it’s different and new every time. There is no score in BDD.

The album

The first song off the album is Discovery Workshop, with The Three Amigos. This is where everything starts. You pick a story off the backlog and take it into a room. You bring along with you three Mexican hats – very important – and underneath those hats, you put a developer, a tester and a business representative – it can be a product owner or a business analyst or a domain expert, and it doesn’t have to be three. You might bring more than three people – you can have a UX person in there, you can have somebody who’s doing DevOps – and you might have more than one developer, tester or business person. What’s not negotiable is that you have representatives from business and IT. And from IT, you have representatives from both developers and testers.

Together, they run this short, focused Discovery Workshop (aiming to last 25 minutes or so) using a technique that Matt Wynne called Example Mapping. It’s a simple way to take your user story and break it down into smaller pieces, which allows you to get more details and discover misunderstandings. Discovering misunderstandings is, in our experience, one of the key goals of BDD. We don’t do BDD because we want to do BDD. BDD is just a means to an end. What we want to do is to deliver high-quality software, quickly, and do that over time, over months and years. To do that, you need to get lots of feedback really quickly, all the time, and it starts when you get feedback on your understanding. Therefore, this is where BDD starts. You test your understanding, and often, realize if there’s a misunderstanding between all these groups.

After 25 minutes, you come up with something that might look like this:

This is a real example map that we created in our company for the product that we’re developing, Cucumber Pro, a collaboration tool for Cucumber. This is for the ‘search’ feature. For this particular user story, we’ve discovered a couple of rules. One rule is that we should only index Gherkin files; we shouldn’t index other kinds of files, like Java files and Javascript files. The other rule is that the search should be scoped to a project. We thought the two business rules were essential in this first user’s story. Since rules on their own can still harbor misunderstandings, we also captured concrete examples to illustrate each of the rules. In the diagram, examples were written on green cards and rules on blue cards.

Some questions came up while we were talking about this story. Should we do a fuzzy match when we search or use an index? What about tags in Gherkin files – do the files get treated differently from normal text? We don’t know. So rather than spend a lot of time discussing this now, let’s just jot them down and move on.

The next thing to do is to translate this into Gherkin. People do it differently. In our case, we introduced a little intermezzo called ‘Don’t Gherkin too soon’. Normally, a lot of teams either have a business analyst or the product owner to write down the Gherkin, or they write down the Gherkin during the discovery workshops. Really bad idea. It’s a bad idea to write it down during the discovery workshop because it’s going to take more than 25 minutes. As a result, some of the amigos will get bored and you’ll come under pressure to stop having these valuable meetings. Instead, we recommend that you don’t write Gherkin in those meetings when you do Example Mapping – capture the rules and examples in whatever format can be written down quickly and is understandable to everyone.

It’s also a bad idea to have somebody representing the business writing the Gherkin, putting it into JIRA and then assigning it to a developer, because by doing so, you miss out on the whole conversation. You’ll be more productive if just two of the three amigos work on the Gherkin after the meeting. That is why one of them should be a developer and, ideally, pair up with a tester. In a perfect world, your business folk will have the time (and inclination) to work on the Gherkin collaboratively with the other amigos, but in our experience, they’re often too busy. We don’t recommend that the business people write the Gherkin on their own because this misses an essential opportunity for the technical team to demonstrate that they’ve correctly understood the business requirements.

So how do we do this? The feature file that results from this example map is shown on the right-hand side of the figure above. The feature that the user story contributes to is described in the title. Then come the rules and the questions, which aren’t used when Cucumber executes the feature file. However, they are useful documentation of the specification agreed at the three amigos meeting. These can be written using markdown syntax so that you can have it nicely rendered. Each scenario represents an example that we captured during example mapping. We haven’t written the Given, When, and Then because as we discussed above, that’s something we’ll do later.

Take a look at section one in the figure above. The scenario is called “No hits from a Java file”, and it consists of three ‘steps’ – a Given, a When, and a Then. We suggest that you write your scenarios backwards. Therefore, start by considering how you would verify that the system behaved correctly. This is called “Starting with the end in mind”, and naturally leads us to express our Then step first. The scenario title tells us that we expect to see “No” hits, and we express this by saying *Then I should get 0 hits”.

When do we expect to see 0 hits? One situation that we expect to see 0 hits is when we search for something and the search term only exists in a file that we don’t want to get hits from, such as a Java file. We write the When step by describing a search for a concrete term – “Hello”.

In what situation would we expect a search for “Hello” to result in 0 hits? Considering the title of this scenario, we specify a context where the text “Hello” only exists in a Java file, and express this in the Given step. Clearly, there are other situations where a search for “Hello” results in 0 hits – for example the term might not exist anywhere – but this scenario is only interested in illustrating the “Gherkin files only” rule.

Now that we have our scenario written, it’s time to drive out the solution using what Konstantin Kudryashov – the author of Cucumber for PHP (Behat) – calls ‘Modelling By Example’. Take a look at section two in the figure above. We take the step “Given a file with content”, and we try to express that in code. Can you see the resemblance between the plain English sentence and the code here? It’s not one to 1:1 mapping exactly, but you’ll find most of the essential pieces – there’s a file and there’s content and there’s a path. We write this before the actual production code is written. This is still test code – what we call the step definition. If we run the code now when you’re using a statically typed language, it won’t compile. If you’re using a dynamically typed language, you’ll have a failing scenario. In either case, now you have to write some code (section 3 in the figure above). Hence, write maybe a little repo class and maybe a little file class. They don’t need to do anything yet, but you’ve sort of pushed some classes and some methods into your system based on the conversation you had in three amigos meeting.

The figure above is a visualisation of what we’ve done. We have a Cucumber Scenario (Gherkin and Step Definition) on the left that will lead you to write your first class, perhaps the repo class, then the file class. There is need of some form of persistence – you need to store the files somehow. Our system actually reads it out of Git database. However, we could use a relational database, a document database or even the file system. We can now run this scenario and it will exercise all this code, but we don’t go through a UI.

If you go back to the classes in section three of the previous figure, you’ll see that they are empty. When do we flesh out their details? We flesh them out as needed to get the scenario(s) to pass, using unit tests (xU in the figure above). Now that we’ve discovered the outside of the system – the immediate methods that we need to call from our step definitions – some new methods and objects fall out of that and get fleshed out using unit tests, following traditional Kent Beck style TDD.

Running this starts to become a bit slow because we have to run each of these tests against the database. Not only is it slow but also difficult because in order for us to make the tests behave consistently each time we run them, we have to be very clear about the contents of the database – whether we’re running a unit test or a Cucumber scenario. We can’t have any leftovers from a previous test run, and we certainly can’t safely share this database with anyone else because they might modify it while our tests are running, causing hard to diagnose failures.

What we want to do is to have our tests use some kind of stub, instead of an actual database. From the perspective of the domain logic, the behavior looks exactly the same. However, the implementation is an in-memory stub implementation (see the grey circle in the figure above).

This is something that a lot of people are familiar with, but they don’t know how to do it. Additionally, they don’t know whether they can trust that the stub works in the same way as the real thing – “If I need to read stuff from a database, what confidence can I possibly get by testing my system against something else?”. We think that’s a mental block that people have to overcome in order to adopt this practice and use stubs.

What we’ve adopted is a way for you to gain that confidence, called ‘contract tests’.

Think of our database and our stub as two things that you can plug in the same socket, which is called a port. The driver using the port doesn’t need to know what’s plugged in it on the other side; the green circle (see figure above) only knows that it’s talking to this port to store and retrieve stuff, but it doesn’t know what it’s going to do on the other side. We can write unit tests that will talk to this port, and we can run those unit tests for both the real implementation and for the stub implementation. In doing so, we get the confidence that the stub behaves in the same way as the real thing. Our port only needs to offer the functionality that our application needs, which minimizes the amount of work we need to do creating the stub. If, as we go along, we discover that we depend on more functionality provided by the real implementation, then we can add another test and extend the behavior provided by the port. This opens up a whole new universe of nice-to-deal-with automated tests because you can unplug the real implementation for most of your testing; you only plug it back in when you’ve actually boot up the system and put it in production.

Your tests start to look a bit like this.

The business logic sits inside something that’s completely decoupled from external devices and services. The core business logic doesn’t know anything about databases, message queues, and web services because we’ve isolated them through these ports.

When we want to boot up the system, we just plug in the real implementations. Often, we’ll have these ports connected to adapters that interact with the real implementation. Therefore, there will be an adapter for the database, an adapter for the queue, and an adapter for the web service. Even better, there is a port that defines how we can interact with our system. We can write a little web server which plugs into this port and displays a UI in a browser. Otherwise, you can write a command-line client (CLI), and use the same functionality, that would plug in the same port. You wouldn’t have to test everything again because you’ve already tested that the business logic works; you just need to test that the UI driver and the CLI driver conform to the same contract offered by the new port.

Finally, we’ve added some support code, shown as a green circle at the top left of the figure above, which uses Selenium to interact with a browser. You can then switch your step definitions to either run your scenarios through a service-level port or interact with the application through the UI in your browser. Independently, you can decide which downstream ports should have their production or test implementation plugged in. You’ll want to run some tests that go through the whole depth of your stack, to get complete confidence, but if all your tests are like that, you’re in a really bad place.

You’re in a place where if something goes wrong, you can’t diagnose where the problem is because it can be anywhere. It’s called diagnostic precision, something doctors talk about, being able to find out the cause of a problem. It’s hard to do that with full-stack, end-to-end tests. They’re really brittle because one change can break all your tests. And where do you think your application changes most often? In the UI. That’s what keeps changing all the time because we need to keep up with the latest UI trends and we need to respond to customer feedback. The business logic tends to change a lot slower. That changes more at the pace of the business strategy, whatever the business is capable of doing. So, you really want to connect the tests to the part of the application that doesn’t change frequently in order to have more easily maintainable tests.

The last thing is speed. Imagine if we plug in the red, production implementations instead of the grey ones.

Those tests typically run 2-3 orders of magnitude slower than tests using stubbed, in-memory implementations, which can mean 5 seconds or more per test. A test can exercise something complicated in the business domain. For instance, execute in 5 milliseconds or less. Do you know why? There’s no Input or Output. All the IO tends to happen outside of these ports. There’s IO when you have a browser involved. There’s IO when you have a database involved or web service – but we’ve stripped all that stuff away. Hence, we can run thousands of tests in a second, or at least thousands of tests in a minute.

Focus on having different kinds of tests; lots and lots of unit tests. A few tests that don’t go through all the heavy infrastructure components and just a few tests that go through the UI. This is called the Test Pyramid (see below). This is like the Holy Grail. If you want to do BDD efficiently, you have to be able to decouple your business logic from all of those slow and brittle devices that sit outside of it. So, ports and adapters – also known as hexagonal architecture – is a pattern with two names. Contract tests allow you to gain the confidence that plugging in the stub is just as good as using the real thing. That is the missing link to moving your tests down to these layers, and it’s only when you’ve done that that you’re able to have lots of meaningful feedback quickly.

Concluding thoughts

Let’s recap a bit. Discovery workshops are great for getting feedback on your understanding. You can weed out many bad assumptions before you even start developing if you just put these people in the same room, and run this structured conversation. The three amigos are people who come to this workshop – people who have business, development, and test perspectives. People with different views tend to misunderstand each other, and hence, need to be in the same room. You can’t communicate through JIRA tickets. Example Mapping is a simple technique that we use when we’re in the discovery workshop to have a structured conversation around analysing user stories and breaking them down.

Don’t Gherkin too soon is just a reminder that you shouldn’t fall for the temptation of trying to write Gherkin during the discovery workshop or (worse) during an analysis phase that the business people do before development. It’s something that you do after the discovery workshop.

BAs, don’t be the saucier. In big restaurants you have the chefs, the master chefs – they don’t go and make the sauce in the kitchen. They have minions to do that and for writing Gherkin. Those minions should be the developers and testers because they know the details on how to write that down to the low level. The BAs will come over and taste the sauce and be like ‘yup, that’s good’ – but they don’t have to make it.

Modelling by Example lets the scenarios drive out the implementation. You’ll find that this encourages your code to use the same words and concepts you used when the domain experts described their requirements in the discovery workshop. This avoids a translation cost between problem domain jargon and solution domain jargon. Amazing! It also allows you to do BDD without having to go through the UI – you get much faster feedback using direct method calls. A consequence of this is that you should write your step definitions in the same language as the code you’re writing because they interact through function calls and method calls. If you’re writing something in .NET, you should use SpecFlow. You can’t invoke .NET objects with Cucumber for Ruby. This is really important.

Ensure that you can run your tests underneath your UI. Do not depend solely on UI tests. They are slow, expensive, and hard to fix.

Ports and adapters is an architectural pattern that enables you to hook your scenarios and unit tests at a lower level. The contract tests, finally, are what give you the confidence to do that.

Therefore, this is how to do BDD at Cucumber. We don’t do it this way because we want to do BDD correctly. If there was something else that allowed me to deliver high-quality software at a steady pace and also allowed people to come and maintain that software in my absence, then I would stop applying BDD without giving it a second thought. However, there isn’t. The goal is to deliver high-quality software quickly, and BDD is the best way we’ve found to do that.

Kind of green

BDD is like jazz.

In this article, we’ve have presented a fusion of ideas that have been in common use for years within the software community. Like music, you have to get good at development techniques before you throw away the rulebook and start improvising. But like jazz, BDD isn’t a rigorous best practice that you have to follow dogmatically. You’ll have to get to green in your way – one that matches your skills, your domain, and your organization.

If you have any questions, we’re always willing to offer advice.

Theo England

Latest posts by Theo England (see all)