Diplograph

Visual Diffs

October 2009

I accidentally pushed a couple of CSS changes that broke an old post and didn't catch it for a couple of hours.

That's not good! I don't want to be randomly breaking old content because I forgot how bad my site's markup is. So instead of working on the Japan trip photos which I've been ignoring all week I wrote a script. It lets me know when my site visually changes.

Here's what it looks like:

I accidentally put a CSS rule in the wrong place and all my text turned bold. The live version of the site is on the left, the development version with the change in the middle, and the differences are highlighted in red on the right.

Another example of what gets shown if something changes. In this case the left image of a side-by-side portrait layout has been accidentally shifted down one line.

Unfortunately the script is closely tied in to the custom engine that powers this site, and it's probably not useful to anyone else. Instead I'd like to share the core snippets that make the thing work. It's written in Ruby, but most if it is glue that calls other tools.

Capturing Rendered Pages

First, I set up two lighttpd servers on different ports. If I visit my server on port 3000, I get the development version of the site; port 3001 serves a copy of the currently published version.

I'm using Paul Hammond's webkit2png script to save webpages as PNG images.

pages.each do |page|
  system './webkit2png', '-W1224', '-F', '-o', page, "http://127.0.0.1:3000/#{page}"
  FileUtils.mv "#{page}-full.png", "#{page}.dev.png"
  system './webkit2png', '-W1224', '-F', '-o', page, "http://127.0.0.1:3001/#{page}"
  FileUtils.mv "#{page}-full.png", "#{page}.live.png"
end

The -W option sets the width of the window used to capture the page and the -F option means only a full size image is saved.

So now for each page in my site I have two images, one of my development copy and one of the live site. The index page, for example, has index.dev.png and index.live.png. If the two images are the same, then I know the page hasn't changed. If they're different, I want to know where.

Comparing The Pages

I'm using ImageMagick to compare the images. I noticed ImageMagick has trouble comparing large images of different sizes (I think it's trying to do some sort of complicated partial image alignment). I've found that if I first force the images to be the same size it usually works better.

def dimensions(image)
  `identify #{image}` =~ / (\d+)x(\d+) /
  return $1.to_i, $2.to_i
end

dev_width, dev_height = dimensions dev_name
live_width, live_height = dimensions live_name

if dev_width != live_width or live_height != dev_height
  max_width, max_height = [dev_width, live_width].max, [dev_height, live_height].max
  system 'mogrify', '-extent', "#{max_width}x#{max_height}", dev_name
  system 'mogrify', '-extent', "#{max_width}x#{max_height}", live_name
end

The identify command, part of ImageMagick, prints out some basic information about the image including its dimensions. mogrify, also part of ImageMagick, modifies an image in place. I've asked it to change the image's bounds but not scale or stretch it (think cropping or Canvas Size if you know Photoshop). By default, the new pixels are filled in with white. Since Diplograph's background is white this works fine.

Then the compare tool (also part of ImageMagick) does the actual visual comparison. A third image with the red highlights is created as part of this.

system 'compare', "#{page}.dev.png", "#{page}.live.png", "#{page}.diff.png"

Generating the Report

With the three images saved on disk it's pretty easy to generate a little bit of HTML and open a browser with the results. However, I found that things weren't snappy with three 15MB images on the page (some pages on the site are pretty big). As a final step I created smaller versions for the side-by-side comparisons:

system 'convert', '-resize', '380', '#{clean_name}.dev.png', '#{clean_name}.dev.small.png'
system 'convert', '-resize', '380', '#{clean_name}.live.png', '#{clean_name}.live.small.png'
system 'convert', '-resize', '380', '#{clean_name}.diff.png', '#{clean_name}.diff.small.png'

convert's another ImageMagick command. The -resize option tells it to scale the image (think Image Size in Photoshop) so that it's 380 pixels wide.

That's pretty much the guts of the script. Total, it's about 100 lines of code, and about half of that is the HTML templates for the reports.

So

I've already found the script useful in validating the changes I make to the site. Along with the textual diffs I usually scan before pushing out an update, I feel a lot better knowing changes aren't going to affect old content in unexpected ways.

Unfortunately, it's really slow: it takes about four and a half minutes in the worst case on this fairly small site. webkit2png is not nearly as efficient as I'd like, and I'm probably going to rewrite it as a Cocoa tool. A lot of the image work currently done by ImageMagick should move into that application as well, since compare still seems to occasionally fail on large images.

I'm also thinking of giving Grand Central Dispatch a shot, since I'm really only using one of my laptop's two cores right now.

Maybe I'll get to this on some future Sunday when I'm ignoring vacation photos.