Rethinking PDF Creation in Ruby

Every now and then, a requirement will come up in a project, that will make me second guess my career choice as a programmer. It usually involves making me go through tedious exercises, never knowing if I'll end up where I want to be along the way.

This happened a couple of weeks ago when one of our projects called for generating PDF reports. The reports needed many stylized elements, layouts, and dynamic graphs. If you've ever generated PDFs in Ruby before, you know that it can be both tedious and difficult using the standard go-to PDF libraries out there. Let's face it, we're web developers. Coming from HTML+CSS-based layouts, writing Ruby code for that stuff is a major pain.

To give you an idea of how heavy it can get, here's an example taken from Prawn. The example was ironically called simple_table.rb.

  Prawn::Document.generate("simple_table.pdf") do 

  table([["foo", "bar " * 15, "baz"], 
         ["baz", "bar", "foo " * 15]], :cell_style => {:padding => 12}) do
    cells.borders = []

    # Use the row() and style() methods to select and style a row.
    style row(0), :border_width => 2, :borders => [:bottom]

    # The style method can take a block, allowing you to customize 
    # properties per-cell.
    style(columns(0..1)) { |cell| cell.borders |= [:right] }
  end

  move_down 12

  table([%w[foo bar bazbaz], %w[baz bar foofoo]], 
        :cell_style => { :padding => 12 }, :width => bounds.width)

end

If you're scratching your head at this point, the code above generates a PDF with two simply styled tables. That's it. If you asked me to implement this in an app, I might have something half-way presentable in an hour. But, I could get a monkey, who just drank a whole bottle of scotch—don't ask about his drinking problem—to write two tables using HTML in less than 5 minutes.

A New Hope

Now some of you may be familiar with PrinceXML, which is a command line utility that will take HTML+CSS and give you back a beautiful PDF. It's even CSS2 compatible and passes the ACID2 test. Awesome. The only problem is that a single server license will set you back $3,800—which is prohibitively not awesome.

Being the open source zealots we are here at Relevance, we set out to find another solution. Tucked away in the internets, we stumbled across wkhtmltopdf. I know what you're thinking; awesome name, huh? wkhtmltopdf uses a WebKit rendering engine to make pretty PDFs out of HTML+CSS. Since it's leveraging WebKit, you get all the tasty CSS3 properties it supports. Ugly PDFs are suddenly a thing of the past.

Goodbye Prawn, Hello PDFKit

We were surprised that none of us had ever heard of wkhtmltopdf, considering how useful it is. When we looked for a Ruby library that leveraged it, we realized it didn't exist. Apparently not a whole lot of other people had heard of it either. That couldn't stand. A couple of open-source Fridays and several gallons of Mountain Dew later, we're excited to announce PDFKit, an open source library that makes working with wkhtmltopdf a snap.

Usage

Inline HTML+CSS => PDF

  kit = PDFKit.new("<h1>Oh Hai!</h1>")
  kit.stylesheets << '/path/to/pdf.css'
  kit.to_pdf # inline PDF
  kit.to_file('/path/to/save/pdf')

HTML file => PDF

  html_file = File.new('/path/to/html')
  kit = PDFKit.new(html_file)
  kit.to_pdf # inline PDF

Remote HTML => PDF

  kit = PDFKit.new("http://google.com")
  kit.to_pdf # inline PDF

What's the big deal?

If this hasn't sunk in yet, let's go over a quick list of wins this buys us:

  1. HTML+CSS - Assuming you're a web developer, there's a good chance that you already know HTML and can work with it efficiently.
  2. CSS3 - We get WebKit's CSS3 support for free. This means effects like drop shadows, rounded borders, transformations and others are super-easy. (Note: effects requiring blur radius do not work.)
  3. Testing - We have tools built into our normal workflow for testing HTML. You can even use Cucumber to drive the development of a PDF with PDFKit.

To give you an idea of how well this fits into our normal workflow here at Relevance, this is how we built out our PDF reports:

  1. Our designer mocked up a sample PDF and converted it to HTML+CSS.
  2. Using Cucumber to drive development, we created a controller action to generate this HTML view of the PDF. (It was just another URL in our app.)
  3. We added a screen-only stylesheet to the HTML that mimics the look of a PDF reader. This allowed us to get a feel of how it would look as a PDF.
  4. Using a bit of Rack Middleware that ships with PDFKit, we can get the PDF version of that web page by simply appending '.pdf' to the url.
  5. We're done. No crazy extra class to handle PDF rendering. No need to spend all day reading through docs to learn the obscure code and magic incantations required to generate your PDF.

Samples

  • PDF of google.com - PDF rendered from http://google.com
  • CSS3 Examples - Sample rendering of common CSS3 effects including border-radius, text-shadow, box-shadow, and border-image. Notice the lack of a blur radius on text-shadow and box-shadow.
  • Sample HTML page with PDF viewer CSS - Example of using a single HTML source to render both a screen version and a PDF version. Uses a media="screen" and media="all" to mark relevant CSS.
  • PDF generated from PDF viewer HTML - PDF generated from sample HTML above. You must tell PDFKit to only use print stylesheets in order to achieve this effect (PDFKit.new(html, :print_media_type => true)).

Go Forth and PDF

I encourage you to take PDFKit for a spin, let us know what you think, and even submit some patches.

If you are pumped about the possibility of using PDFKit on a future project, then I've achieved my goal. If not, I'd ask you to think about what is missing, find out if it's already out there, and let us know how to make PDFKit even better.