Feb 11 2014Comments
Jan 31 2014Comments
Want to meet a Cognitect in person? Here's where you can find us during the month of February:
Jan 30 2014Comments
Jan 14 2014Comments
Dec 30 2013Comments
Dec 23 2013Comments
Ralph Waldo Emerson famously observed:
A foolish consistency is the hobgoblin of little minds.
Given that, I think that Mr. Emerson would be happy to know that "consistency"—the word—is not consistent. There are two separate technical meanings of "consistency," two meanings that technical people tend to conflate.
Consistency as Predicate
The older definition of the word comes from database researchers, especially including Jim Gray. He gave us locking, transactions, distributed transactions, consistency, and the ever-popular "ACID," all back when database was still spelled as "data base." I think we can regard him as authoritative.
In this definition, "consistency" is defined as a boolean function—a predicate—that takes the whole database as an input. When it returns "true", then the database is in a good state. To keep it that way, the server checks every transaction to see if it would make the predicate return false. If so, the server must reject that transaction.
We can regard this definition as having an "interior" perspective. That is, only the database server itself is able to apply the consistency predicate. Most databases do not have any way for an observer to get the complete value of the database at a point in time.
The idea of a consistency predicate also clarifies the relationship between consistency, atomicity, and isolation. The database server is temporarily permitted to break the consistency predicate while applying a transaction. (As an aside, I think that paper anticipates Datomic by 31 years.) However, it does not allow any outside observer to perceive the intermediate, unacceptable state. To external observers, the change appears atomic. Transactions do not commute, so the server also ensures that the effects of the transactions appear to obey some partial ordering. (Please note the very careful wording on that last sentence.) That's isolation.
Computationally, it is impractical to apply a general predicate over the entire database at the end of every transaction. Implementations use weaker forms of enforcement, typically syntactic constraints (i.e., CHAR(20) NOT NULL) and referential integrity. This moves from the notion of a consistency predicate to a lesser guarantee.
As an aside, using this weaker form of the predicate is one reason why we get data cleansing problems like negative ages and Little Bobby Tables. It's because we can't make a big table that has every possible human name, so we apply a weaker rule that says "any sequence of alphanumeric characters could be a name."
Consistency as History
There is a completely separate definition of "consistency" that comes from the distributed systems world. This definition of consistency has more to do with applying the same transactions in the same order at every node in a system. This is the meaning of "consistency" which is used in the CAP Theorem. In that paper, consistency means, "there must exist a total order on all operations such that each operation looks as if it were completed at a single instant." This definition of consistency is about the history of operations and their causality.
That kind of consistency—strict linearizability—is a very high bar. There are many ways to weaken it and exploit "loopholes". CRDTs also avoid the need for linearizability by requiring all operations to be commutative. With CRDTs, as long as every node has a permutation of the ordering of operations, they will still report the same resulting value.
One form of the word "consistency" talks about the state of the system at a given point in time. The other talks about the history of operations that produced the current state. Are they fully substitutable? No.
For instance, suppose the global consistency predicate says, "The atomic object V holds an integer." There are two operations on V:
dec which increment and decrement its value by one. The starting value is 0 and the representation can grow arbitrarily. Given that definition, we can have "consistency as predicate" without "consistency as history." That's because we can reorder those operations at will and still have the same value at the end. That means our predicate holds no matter what the history says. Two nodes that see a different history of operations are inconsistent by the second definition, but consistent by the first.
So it turns out that "consistency (predicate)" and "consistency (history)" are two distinct ideas that happen to share a word. It is always an error to substitute the distributed systems definition of "consistency" for the C in ACID.
The C in CAP is not the C in ACID.
Dec 17 2013Comments
Dec 09 2013Comments
I love getting to write these posts - welcoming aboard new team members and introducing them to the world as part of Cognitect. Today, I get to do two of them.
First, Paul deGrandis. Paul joined us as a contractor earlier in 2013 and it became obvious early on that he was our type of cat: he's smart, he's committed, and he's just a touch crazy. In his own words:
Paul deGrandis lives for magnificent engineering. Elegant, well-founded, useful solutions to problems that say something about engineering's beauty. He loves metrics, taking on the impossible, and making lives better through technology.
He joined us full time this fall, and we're thrilled to have him.
Second, just this week, David Chelimsky signed on. We've known David forever, it seems, so it is a great pleasure to have him join now. He starts for real in January. In his words:
David Chelimsky has been programming professionally since 1998 and has worked in a variety of industries including trading, web publishing, and agile training/consulting. He was a co-author of The RSpec Book, and the technical and community leader of the RSpec project for 6 years. He speaks regularly at conferences on topics ranging from software testing to design principles. In his spare time, David likes to travel, study Portuguese, and play Brazilian music on his cavaquinho.
We welcome you both and are looking forward to awesome times together!
Dec 03 2013Comments
Want to meet a Cognitect in person? Here's where you can find us during the month of December:
Nov 26 2013Comments
Don't get me wrong, I love unit testing. The practice of unit testing is probably the most important quality innovation in my whole career. Unit testing has spread beyond the agile development community where it started into the mainstream, and we are all better off for it.
We need to be aware of some serious limitations, though. For example, in "Out of the Tar Pit", Moseley and Marks say:
The key problem with testing is that a test (of any kind) that uses one particular set of inputs tells you nothing at all about the behaviour of the system or component when it is given a different set of inputs. The huge number of different possible inputs usually rules out the possibility of testing them all, hence the unavoidable concern with testing will always be, "have you performed the right tests?" The only certain answer you will ever get to this question is an answer in the negative --- when the system breaks.
We can only write unit tests with a certain number of input cases. Too few, and you miss an important edge case. Too many, and the cost of maintaining the tests themselves becomes onerous.
Worse yet, we know that unit tests are inadequate when we need to test overall system properties, in the presence of GUIs, and when concurrency is involved.
So here are four testing strategies that each supplement unit tests with more ways to gain confidence in your fully assembled system.
Automated Contract Testing
Automated Contract Testing uses a data-oriented specification of a service to help with two key tasks:
- Exercise the service and verify that it adheres to its invariants.
- Simulate the service for development purposes.
It looks like this:
Some things to consider when using this type of testing:
- The contract model should be written by a consumer to express only the parts of the service interface they care about. (If you overspecify by modeling things you don't actually use, then your tests will throw false negatives.)
- The supplier should not write the contract. Consumers write models that express their desired interface partly to help validate their understanding of the protocol. They may also uncover cases that are in the supplier's blind spot.
- The test double should not make any assumptions about the logical consistency of the results with respect to the parameters. You should only be testing the application code and the way it deals with the protocol.
- E.g., if you are testing an "add to cart" interface, do not verify that the item you requested to add was the one actually added. That is coupling to the implementation logic of the back end service. Instead, simply verify that the service accepts a well-formed request and returns a well-formed result.
A nice library to help with this style of test is Janus.
Property-based testing is a derivative of the "formal specifications" ideas. It uses a model of the system to describe the allowed inputs, outputs, and state transitions. Then it randomly (but repeatably) generates a vast number of test cases to exercise the system. Instead of looking for success, property-based testing looks for failures. It detects states and values that could not have been produced according to the laws of the model, and flags those cases as failures.
Property-based testing looks like this:
The canonical property-based testing tool is Quickcheck. Many partial and fragmentary open-source tools claim to be "quickcheck clones" but they lack two really important parts: search-space optimization and failure minimization. Search-space optimization uses features of the model to probe the "most important" cases rather than using a sheer brute-force approach. Failure minimization is an important technique for making the failures useful. Upon finding a failing case, minimization kicks in and searches for the simplest case that recreates the failure. Without it, understanding a failure case is just about as hard as debugging an end-user defect report.
Considerations when using property-based testing:
- Specifying the model is a specialized coding task. The model is often 10% of the size of the original system, but can still be very large.
- Test failures sometimes indicate a problem in the system and sometimes a problem in the model. Most business systems are not very well specified (in the rigorous CS sense of the term) and suffer from many edge cases. Even formal standards from international committees can be riddled with ambiguities and errors.
- If the system under test is not self-contained, then non-repeatable test failures can confuse the framework. Test isolation (e.g., mocks for all integration points) is really essential.
- This is not a cheap approach. Developing a robust model can take many months. If a company commits to this approach, then the model itself becomes a major investment. (In some ways, the model will be more of an asset than the code base, since it specifies the system behavior independently of any implementation technology!)
Fault Injection is pretty much what it sounds like. You run the system under test in a controlled environment, then force "bad things" to happen. These days, "bad things" mostly means network problems and hacking attacks. I'll focus on the network problems for now.
One particular fault injection tool that has some interesting results lately is Jepsen. Jepsen's author, Kyle Kingsbury, has been able to demonstrate data loss in all of the current crop of eventually consistent NoSQL databases. You can clone that repo to duplicate his results.
It looks like this:
Jepsen itself runs a bunch of VMs, then generates load. While the load is running against the system, Jepsen can introduce partitions and delays into the virtual network interfaces. By introducing controlled faults and delays in the network, Jepsen lets us try out conditions that can happen "in the wild" and see how the system behaves.
After running the test scenario, we use a validator to detect incorrect results.
- Jepsen itself doesn't provide much help for validation. As delivered, it just tries to store a monotonic sequence of integers. For application-specific tests, you must write a separate validator.
- Generating load needs to be predictable enough to verify the data and messages out from the system. That either means scripts or pseudo-random cases with a controlled seed.
- This is another test method that can't prove success, but can detect failures.
Simulation testing is the most repeatable of these methods. In simulation testing, we use a traffic model to generate a large volume of plausible "actions" for the system. Instead of just running those actions, though, we store them in a database.
The activity model is typically a small number of parameters to describe things like distribution of user types, ratio of logged-in to not-logged-in users, likelihood of new registrations, and so on. We use these parameters to create a database of actions to be executed later.
The event stream database will be reused for many different test runs, so we want to keep track of which version of the model and event generator were used to create it. This will be a recurring pattern with simulation testing: we always know the provenance of the data.
The simulation runner then executes the actions against the system under test. The system under test must be initialized with a known, versioned data set. (We'll also record the version of starting dataset that was used.) Because this runner is a separate program, we can turn a dial to control how fast the simulation runs. We can go from real time, to double speed, to one-tenth speed.
Where most test methods would verify the system output immediately, simulation testing actually just captures everything in yet another database. This database of outputs includes the final dataset when the event stream was completed, plus all the outputs generated by the system during the simulation. (Bear in mind that this "database" could just be log files.) These are the normal outputs of the system, just captured permanently.
Like everything else here, the output database is versioned. All the parameters and versions are recorded in the test record. That way we can tie a test report to exactly the inputs that created it.
Speaking of the test report, we run validations against the resulting dataset as a final process. Validations may be boolean checks for correctness (is the resulting value what was expected?) but they may also verify global properties. Did the system process all inputs within the allowed time? Were all the inputs accepted and acted on? Does "money in" balance with "money out"? These global properties cannot be validated during the simulation run.
An interesting feature of this model is that validations don't all need to exist when you run the simulation. We can think of new things to check long after the test run. For example, on finding a bug, we can create an probe for that bug and then find out how far back it goes.
Simulant is a tool for building simulation tests.
- This approach has some similarities to property-based testing. There
are two major differences:
- Property-based testing has a complete formal model of the system used for both case generation and invariant checking. In contrast, simulation testing uses a much simpler model of incoming traffic and a separate set of invariants. Splitting these two things makes each one simpler.
- Property-based testing runs its assertions immediately. To add new assertions, you must re-run the whole test.
- Because assertions are run after the fact, you can verify global properties about the system. For example, does "money in" balance with "money out"?
- Using virtual machines and storage for the system under test makes it much simpler to initialize with a known good state.
- We can create multiple event streams to represent different "weather conditions." E.g., a product launch scenario versus just a heavy shopping day.
- The event stream doesn't need to be overly realistic. It's often better to have a naive transition model informed by a handful of useful parameters. Generate lots of data volume instead of finely modeled scenarios.
- When there are failures, they are usually very obvious from the assertions. The tougher part is figuring out how a bad final state happened. More output in the test record is better.
- If you squint, this looks like a load testing model. While it's possible to use this for load testing, it can be awkward. We've found a hybrid approach to be better: run a load generator for the "bulk traffic," then use simulation testing to check correctness under high load conditions. Of course, you sacrifice repeatability this way.