Mar 03 2009Comments
It is now reasonable for some agile teams to fork most or all of their dependencies. Here's Why:
A Brief History of Forking
Traditionally, forking a software project was a big deal. You forked when you disagreed about direction and philosophy. Disagreed a lot--so much that you have were willing to make a big effort to do things your own way. And while forking didn't have to imply bad blood between forkers and those getting forked, that was sometimes the case as well.
So what is the big forking deal? Forking causes problems at several levels:
- API Compatibility: How does a prospective user of a library deal with multiple forks with differing APIs.
- Quality: If there are multiple forks of a project, how do you know which one is good?
- Configuration: How do you track dependencies when multiple forks of a project have the same name.
- Documentation: What's the difference between the forks?
And, of course, all of these problems increase for larger code bases--possibly faster than linearly with code size.
So, for a long time, it was safe to assume that forking was a big deal, and best employed rarely.
To understand what changed, it is best to go back and revisit the problems of forking and ask how they might be exacerbated or mitigated:
- API compatibility is a problem that grows if API differences are large, or expensive to discover. On the other hand, API changes are much less of a problem when they are small, or cheap to respond to.
- Evaluating code quality is a problem if QA is expensive, either for a library itself, or for the integration between that library and your system. On the other hand, if there was a cheap test that said "This fork of X works correctly with my project," then forking X yourself looks more reasonable.
- Configuration is a problem to the extent that it is expensive to track and manage forks. On the other hand, if you can easily review your dependencies, and switch between forks of a project in a few seconds, then configuration becomes less of a problem.
- Documentation is a problem. What if a fork fixes a problem, or adds a feature, but the documentation is not up to date? On the other hand, if there is a cheap way to discover what a fork does, then a buffet of forks to choose from can be a good thing.
So forking is more reasonable, to the extent that dealing with API compatibility, code quality, configuration, and documentation/discovery is cheap and fast. Different people can reasonably disagree about "how cheap" and "how fast," but everyone should be able to agree that there is a continuum, and that particular development practices could make forking easier or harder.
Reaching a Tipping Point
Over the last several years, we at Relevance have participated in several trends, each of which lowers the cost of forking.
- Good unit tests help with several of the items above. A good unit test suite will document the API and demonstrate the code's quality.
- Good integration tests make it cheap to discover API compatibility issues. A good integration test makes it easy to evaluate alternative configurations that use different forks of dependent libraries. Finally, a good integration test can quickly prove that fork X works with your project.
- Continuous integration helps you keep on top of the code health of a large number of projects. We developed RunCodeRun in part to keep all of our forked projects building cleanly.
- Low-ceremony languages make everything smaller. Code is smaller and the tests are smaller. Documentation can be done almost entirely in the code itself.
- Test-driven development helps produce good APIs between subsystems, by making bad ideas painful during development. Good APIs have a smaller surface area, and need to change less.
- Distributed version control systems such as Git make forking itself cheap and easy. More importantly, you can quickly update your configuration to switch between different forks. Also, it is much easier to manage merges and maintain your own personal fork that pulls needful pieces from multiple other forks.
- Relentless refactoring keeps code readable, so the prospect of evaluating a fork by reading or diffing its source code is much more palatable.
- Open source libraries and a culture of source-code deployment make it easy to manage all your dependencies at the source code level. Most of our Ruby projects vendor everything, so it is a simple Git operation to switch to a different commit (or a different fork!) of any dependency.
- Social sites like GitHub make it easy to discover and track forks of a particular project, and for many different forks to pull from each other. Tools like the Network Graph Visualizer show how your fork differs from other forks, and can help you quickly locate commits you may be interested in.
The combination of all these factors makes forking way easier than I would have believed possible, even a few years ago.
Fork Your Dependencies!
As a result of all these trends, we now regularly fork the third-party dependencies in our projects. Imagine the following scenario: You are nearing a project deadline, and you discover a bug in a third-party library. Here are some possible reactions:
- The closed-source way: Call a paid support line for your commercial software, explain your problem to a drone, and hope for a fix in the next release 18 months out.
- The open-source way: Contact the project maintainers and convince them of the justice of your cause. If that is taking too long, and you are in a language that enables it, monkey patch.
- The "Fork 'em!" way: Fork the project and fix it yourself. The original owners can debate your ideas in their own time (or not). Who cares? You are back up and running.
The cost of forking is now low, that it isn't even limited to fixing critical bugs. We fork to fix minor bugs, to add features, or to make usability improvements to APIs. In short, any kind of change we might make in our own code, we might also be willing to fork and make in somebody else's.
If you had asked me 18 months ago if the "fork 'em" approach would work, I would have said "no," and written you off as a crank. Luckily, nobody asked me! Rob just started doing it, and quickly showed that it worked.
Will this work for everybody? Absolutely not. You need to have a lot of other practices in place first. Otherwise, casual forking will only lead to trouble. But if you are running agile the right way, producing and consuming clean, tight, tested code, you may discover forking around is downright healthy for your code.