Saturday, December 29, 2007

Automatic Asset Minimization and Packaging with Rails 2.0.x

With the recent release of Rails 2.0, many of us are reviewing our approaches to common problems. Many new features have been added to Rails, and some old tricks are either no longer necessary or no longer work.

I am developing a project with Rails 2.0 and am getting close to putting it into production. A recurring issue for today's web developers is that of asset packaging, or the combination of multiple site assets into a single file. Specifically, we're talking about Javascript and CSS.

A given "Web 2.0" (a term I wish had recently been found dead in a cramped apartment in Brooklyn) site might have a half dozen Javascript or CSS files to deliver to a user, and web browsers are not all that efficient at retrieving them. Each one requires a separate TCP connection to the server, and many browsers are only capable of getting two of these files concurrently. This means delays for your users.

In Rails 2.0 (and previously in Edge Rails), it's possible to combine multiple Javascript and CSS files using the javascript_include_tag and stylesheet_link_tag functions in your html.erb files; simply add :cache => true to the parameters like this:

<%= javascript_include_tag 'prototype', 'effects', :cache => true %>
<%= stylesheet_link_tag 'main', 'shop', 'form', :cache => true %>

With :cache => true and when running in your production environment, Rails will automatically combine your Javascript and CSS assets into single files (all.js and all.css, respectively) and significantly reduce your site's load time.

However, this really only solves part of the problem. A common technique used to further improve site performance is to compress Javascript and CSS by removing unnecessary whitespace and comments. I am not sure why this wasn't included as part of Rails' built-in caching features, but it seemed to me it should be easy to add.

Turns out I was mostly right. Google "javascript minimization" (or minification) and you'll see it's a pretty hot topic. The Asset Packager plugin from Scott Becker does this, as well as CSS compression, but is targeted at Rails 1.x and doesn't really make sense in the face of Rails 2.0.

So I set out to solve this problem in an elegant way for Rails 2.0. Asset Packager uses a Ruby script called jsmin.rb by Uladzislau Latynski which is based on jsmin.c by Douglas Crockford. The thing is, jsmin.rb is not a class or library, but rather a standalone executable that operates on stdin and stdout. Asset Pacakger actually forks a ruby shell process to do its Javascript minimization, and this seemed like folly if it could be done internal to Rails.

Accordingly, I modified jsmin.rb to operate as a singleton class and with a class method you could pass Javascript data to. Then it was simply a matter of monkey patching this function into ActionView::Helpers::AssetTagHelper, home of javascript_include_tag and stylesheet_link_tag.

I also wanted to add in CSS compression, which turned out to be easy. The javascript_include_tag and stylesheet_link_tag functions both use the same underlying functions to package their assets, so it was a simple case of replacing them with equivalents that do compression appropriately, based on whether we are dealing with CSS or JS.


module ActionView
module Helpers
module AssetTagHelper
require 'jsminlib'

def compress_css(source)
source.gsub!(/\s+/, " ") # collapse space
source.gsub!(/\/\*(.*?)\*\/ /, "") # remove comments
source.gsub!(/\} /, "}\n") # add line breaks
source.gsub!(/\n$/, "") # remove last break
source.gsub!(/ \{ /, " {") # trim inside brackets
source.gsub!(/; \}/, "}") # trim inside brackets

def get_file_contents(filename)
contents =
if filename =~ /\.js$/
elsif filename =~ /\.css$/

def join_asset_file_contents(paths)
paths.collect { |path|
get_file_contents(File.join(ASSETS_DIR, path.split("?").first)) }.join("\n\n")


By simply modifying join_asset_file_contents to use our new function get_file_contents instead of, we quickly get to the heart of the matter. CSS files get compress_css run on them, while Javascript files get JSMin.minimize run on them. Your :cache => true Javascript and CSS assets will now be gloriously combined and compressed!

Note that the above monkey patch requires jsminlib.rb, which you can download here. It is just a modified version of the original jsmin.rb, and you will want to put it into your Rails lib directory.

A good next step would be to further enhance get_file_contents to do Javascript obfuscation, which allows for the replacement of variable names and thus even further compression; it also tends to make Javascript code nearly incomprehensible and thus harder to steal, which may be desirable for some developers. I haven't found any native Ruby ways to do this yet, but it seems to me that this would be a good place for a C extension (or similar), and that this should all be put into a tiny and lightweight plugin.

I'm always amazed at how easy it is to bend Rails (and Ruby) to one's will, and in this case it's really quite elegant and straightforward. I'd love to hear your ideas about how to take this idea forward, potentially even including it in Rails itself.

Download the files here:

Wednesday, November 14, 2007

Hacking A Program to Feed the World

While I was working on some changes to Twittervision yesterday, I saw someone mention, a site where you can go quiz yourself on vocabulary words and help feed the world. How? Each word you get right gives 10 grains of rice to, one hopes, someone who needs it.

The idea is that you will sit there for hours and look at the advertising from the do-gooder multinationals who sponsor it. Which I did for a while. I got up to level 44 or so and got to feeling pretty good about Toshiba and Macy's.

It occurred to me though that my computer could also play this game, and has a much better memory for words than I do. In fact, once it learns something, it always chooses the right answer.

So I wrote a program to play the vocabulary game. In parallel. 50 browsers at a time. Sharing what they learn with each other. Cumulatively.

It's a multithreaded Ruby program using WWW::Mechanize and Hpricot. Nothing terribly fancy, but it does learn from each right and wrong answer, and after just a few minutes seems to hit a stride of about 75-80% accuracy. And a rate of about 200,000 grains of rice per hour (depending on the speed of your connection).

UPDATE: With some tuning, the script is now able to push out about 600,000 grains of rice per hour, which according to the statistic of 20,000 grains per person per day, is enough to feed over 720 people per day! If one thousand people run this script, it will (allegedly) generate enough to feed 720,000 people per day.

Before you go off on me, disclaimer: Yes, I realize this program subverts the intent of the site. I've released this not to "game" but simply to show a flaw in their design and have a little fun at the same time. If what they are after is human interaction, this design doesn't mandate it. That's all I'm saying.

Run it for a while and see how many people you can feed!


  • Ruby (Linux, OS X, Other)
  • Rubygems
  • gem install mechanize --include-dependencies

Download the code

Saturday, November 10, 2007

Concurrent Erlang: Watch out, API developers!

Continuing my theme of lifting ideas from Dave Thomas' blog posts, in our last episode we built a somewhat broken program to sequentially fetch feed information from Youtube's XML API.

I had some trouble understanding why I wasn't getting #xmlText records back from xmerl_xpath. Thanks to a comment by Ulf Wiger, I now understand what was going wrong.

As many of us do, I was using the shell to play around with ideas before committing them to my program, and in the shell the record format is not defined because xmerl.hrl is not referenced. This *was* being included in my program, but I wasn't running the program in my testing in the shell.

I took his advice, used the force, and got my patterns to match #xmlText records.

I also copied Dave Thomas' design pattern for parallel spawning of the fetch process to produce this program which 1) grabs a feed of the most_viewed videos on YouTube, and then 2) grabs in parallel the user profiles for each of those videos.

While I still only have a rudimentary understanding of the language, I at least understand everything that's going on in this program. It's amazing how quick concurrent programs in Erlang can be. The fetch_parallel function in this program runs in about 3 seconds, while the fetch_sequential version takes about 20 seconds.

If you think about what this means for API developers, it has scary implications. In short, they will need a lot more bandwidth and processing capacity to deal with concurrent clients than are presently needed to deal with a sequential universe. Most API developers are accustomed to interacting with programs that make a single request, do some processing, and then make another related request.

A world of Erlang-derived, concurrent API clients likely calls for Erlang-derived concurrent API servers. Today's API interactions are timid, one-off requests compared to what's possible in a concurrent API interaction.

Imagine a recursive program designed to spider through API data until it finds the results it's looking for. You could easily write a program that grabs a set of N search results, which in turn generates N concurrent API queries, which in turn generates N^2 concurrent API requests, which in turn generates N^3 requests.

You get the idea. Rather than being simple request & response mechanisms, APIs in fact expose all of the data they contain -- in parallel and all at once. A single concurrent Erlang client can easily create as much simultaneous load as 10,000 individual sequential API consumers do now.

API developers should start pondering the answer to this question. Right now, there are no standards for enforcing best practices on most APIs. There's nothing to stop a developer from requesting the same data over and over again from an API, other than things like the Google maps geolocation API limit of 50,000 requests per day. But what about caching and expiring data, refusing repetitive requests, enforcing bandwidth limits or other strategies?

Many people do all of these things in different ways, but we're at the tip of the iceberg in terms of addressing these kinds of issues. A poorly designed sequential API client is one thing; a badly designed concurrent API client is another thing altogether and could constitute a kind of DoS (denial of service) attack.

Start thinking now about how you're going to deal with the guy who polls you every 10 seconds for the latest status of all 142,000 of his users -- in parallel, 15,000 at a time.

And for you would-be API terrorists out there, here's some code:

-export([fetch_sequential/0, fetch_parallel/0]).

get_feed() ->
{ ok, {_Status, _Headers, Body }} = http:request(""),
{ Xml, _Rest } = xmerl_scan:string(Body),
xmerl_xpath:string("//author/name/text()", Xml).

get_user_profile(User) ->
#xmlText{value = Name} = User,
URL = "" ++ Name,
{ ok, {_Status, _Headers, Body} } = http:request(URL),
{ Xml, _Rest } = xmerl_scan:string(Body),
[#xmlText{value = Id}] = xmerl_xpath:string("//id/text()", Xml),
[#xmlText{value = Published}] = xmerl_xpath:string("//published/text()", Xml),
{ Name, Id, Published }.

fetch_sequential() ->
lists:map(fun get_user_profile/1, get_feed()).

fetch_parallel() ->
Users = get_feed(),
lists:foreach(fun background_fetch/1, Users),

background_fetch(User) ->
ParentPID = self(),
spawn(fun() ->
ParentPID ! { ok, get_user_profile(User) }

gather_results(Users) ->
lists:map(fun(_) ->
{ ok, Anything } -> Anything
end, Users).

Sunday, November 4, 2007

Erlang Makes My Head Hurt

For those of you who haven't heard about Erlang yet, it is a functional programming language (like Lisp or Prolog) developed quietly over the last 20 years by telecoms giant Ericsson for use in telco switches.

Ericsson has been using it for roughly the last 14 years; it has several properties that make it particularly relevant to many of the problems facing developers today. It's one of the few languages that is particularly good at letting programmers take advantage of multi-core/multi-CPU systems, and distribute services across multiple boxes. YAWS, a webserver written in Erlang and a poster child of its efficiencies, kicks Apache's tail from a scalability standpoint. This is no small accomplishment for a high level language.

Today's scaling strategies revolve less around faster clock speeds and more around adding cores. Scaling out to many machines is also important, but power and space considerations are also more of an issue than ever before.

So Erlang is gaining ground because it addresses scalability for this new age of multi-core systems. Today you might have a dual Clovertown Xeon box with 8 cores, but very little software to take advantage of it. Once you get past 2 or 4 cores, that extra capacity provides little to no benefit. Enter a language like Erlang, and suddenly all that power becomes available to the programmer.

Some of my coder buddies, (Jay Phillips, Rich Kilmer and Marcel Molina) have been looking at Erlang for various tasks, and Dave Thomas' blog posts on Erlang have also inspired me to take a look at the language for some of my own work.

I picked up Joe Armstrong's book, Programming Erlang from Pragmatic Bookshelf and started reading it on a recent airplane flight.

Today I put together my first Erlang program, based on knowledge gleaned from Dave Thomas' postings and from the book.

This very simple program grabs the top_rated feed of videos from YouTube (an XML RSS feed) and then iterates through the result set to get the profile URL for each user. It is a fairly useless and trivial example, but if I can make this work then there are other things I can do down the line.


get_feed() ->
{ ok, {_Status, _Headers, Body }} = http:request(""),
{ Xml, _Rest } = xmerl_scan:string(Body),
xmerl_xpath:string("//author/name/text()", Xml).

get_user_profile(User) ->
{_,[A|B],_,[],Name,_} = User,
URL = "" ++ Name,
{ ok, {_Status, _Headers, Body }} = http:request(URL),
{ Xml, _Rest } = xmerl_scan:string(Body),
[{_,[C|D],_,[],Id,_}] = xmerl_xpath:string("//id/text()", Xml),
{ Name, Id }.

fetch_each_user() ->
lists:map(fun get_user_profile/1, get_feed()).

I am pretty sure I am doing this All Wrong (tm).

My biggest area of confusion comes in the pattern matching that's required to match (and thus read) the results from the xmerl_xpath:string parsing. According to Dave Thomas' examples, xmerl_xpath should produce a #xmlText record (or a set of them) that can then be matched with the #xmlText{} syntax.

In practice, and with the Youtube API data I used, I see no such #xmlText records. Instead I get a flattened tuple from such parsing, along the lines of:

>xmerl_xpath:string("//location/text()", Xml)

The only way I can find to match this is something like this:

[{_,[A|B],_,[],Location,_}] = xmerl_xpath:string("//location/text()", Xml)

I am sure I am missing some key step or concept, but that's how we learn new languages -- stumble along til we figure out how to solve the things we want to solve.

There's an incredible amount of functionality packed into these 19 lines of code. It'll be even more amazing when I figure out my initial questions and then add concurrent processing of the user profile URLs. In theory I can simultaneously process dozens of URL feeds from Youtube and spider their API data as though through a fire hose. Stay tuned.

Meantime if anyone has any suggestions on my current puzzlements I'd love to hear them.

Erlang is a cool language. It doesn't give me the aesthetic fuzzies I receive from programming in Ruby, but I do get pretty jazzed up thinking about what should be possible from a performance and economy standpoint. Erlang doesn't allow for mutable state; variables have fixed values once assigned, and algorithms are generally handled via recursion. This is how it scales out to so many cores/cpus/machines so readily. It's kinda weird if you're used to "normal" mutable state languages.

Whenever I learn a new language (human or computer) I generally have weird and overactive dreams. I attribute this to my brain shuffling things around to accommodate new grammar and semantics.

The last few days have produced particularly vivid dreams.

Monday, October 15, 2007

18 Months Windows-Free (Nearly)

I'm Dave and I am a former Windows user.

Not that I ever liked it. Back in the day, I used Atari 8-bit and 16-bit 68000 computers. The Atari ST machines were cool because you could run Mac programs on them with the help of the Spectre GCR Mac emulator, and the native Atari programs (like PageStream and Calamus) were actually pretty very good themselves. Power without the price. Stickin' it to the man never felt so good... we had the best of both worlds.

Around 1994, as I was also getting into Linux, I started to use Windows as my primary desktop UI. It sucked, but at least back in those days (Windows for Workgroups 3.11) you knew how it sucked and why. And in general you could work around it. It was lightweight enough to be manageable.

Back when I ran an ISP, I developed a bunch of software using Microsoft SQL Server, ASP, Access, and other relatively common, garden-variety tools of the day. It got the job done and I was happy enough.

During the Mac's PowerPC years, I always found the Mac to be needlessly obscure and imperious; its choice of the PowerPC architecture, while admirable from a performance standpoint, just made very little sense in terms of interfacing with the rest of the world.

The Web hadn't really emerged as a viable application development platform at that point, and the Mac was pointlessly obscure in the face of Windows. Everything was available for Windows, and the Mac was precious, delicate, and oh-so-special. I wasn't interested, despite my respect for the platform.

Around December 2004 I succumbed and bought an iBook G4, a PowerPC machine. As a software developer I was curious about how OS X was coming along so I thought it would be cool to have a current Mac.

When in early 2006, Steve announced they would be switching to Intel chips, I felt a nearly religious change of heart towards Apple, or that Steve had one towards me. The implications were obvious: the long freeze was over. Mac would become Intel friendly, and Intel-friendly OS's like Linux and Windows were suddenly going to be a possibility on the Mac. Yeah, I am aware that there were ways to run Linux and Windows on PPC, but it was hard (and obscure). I'm all about ubiquity and reaching for things that can be done on a huge platform.

I went out and bought a Mac Mini Core Duo shortly after and have never looked back. While I'm writing this on my old decrepit iBook G4, I also own a MacBook Pro, a MacBook, a Mac Pro, a MacMini, an iMac, an iPhone, and two iPods. I am a certiifiable Apple Fanboy, though I try hard to hide it (and mitigate it).

I still use Windows to run Quickbooks and Quicken, and the occasional odd program (like the Nokia phone firmware updater). It seems it can't be easily avoided. The Mac versions of both Quickbooks and Quicken are crimes against humanity, though the Windows versions aren't much better. No matter, home is where the heart is, and I must say that to finally be using a decent OS on decent hardware on a regular basis is truly bliss.

Now I read reports about Microsoft's dominance in the OS space and I just shrug and think "yeah I guess", while I myself have been shielded from the tyranny for nearly 2 years... Now when I run Windows, I look at it as some outmoded form of existence that I revisit now and again for nostalgia.

Don't even talk to me about Vista.

Last February, upon its release, I went out and bought a copy, thinking that as a technologist, I should know what it does and doesn't do. As an optimist, I figured it had to have some redeeming qualities. After loading it on my PC, I can say it was a pointless exercise bordering on utter disaster.

I wanted to "experience" the Aero-glass features, so I bought a new $175 video card. I bought a new $200 300Gb hard drive so I could install Vista without imperiling my old XP installation. This was all a huge mistake. I ended up with my XP installed as a secondary drive, and a bunch of programs that wouldn't run. Accept or Deny?

Then my Vista boot drive died, and the whole thing ended up as a pile on the floor with a Knoppix Live-CD stuffed in the DVD drive, acting like a life-raft on the Titanic, trying to tar+scp things off onto whatevner machine would take it. If I had a gun, I'd shoot the thing. It is dead, and Vista killed it as far as I am concerned.

Now I keep my virtual machines on an external USB drive I can carry between my MacPro or my MacBook Pro (depending on where I am) and I am a lot happier.

Bless you Steve, for finally coming around to Intel. It may not always be better, but at least it's what everybody else is using. Now when I hear about the latest stupid ideas from Microsoft, I can just shrug them off, secure in the knowledge that a) my Mac will work great, and b) I can run Linux or BSD or Solaris to do anything else.

And I can even know that I can run Windows, if I absolutely must. For creative professionals (and by this I include everybody from artists to coders to database guys), the Mac is truly a gift to you. Enjoy it, appreciate it. If you are still on Windows or forcing yourself to use a Linux UI for ideological pride, it's time to move.

Anyone with a creative bone in their body should be using today's Macs.

Saturday, October 6, 2007

MoMA NY Selects Twittervision & Flickrvision

Yesterday, I received final confirmation that the Museum of Modern Art in New York has selected my mash-ups and for its 2008 exhibition Design and the Elastic Mind.

I'm certainly very flattered to be included and have never considered myself to be an artist. I didn't seek out MoMA on this. I am just very, very happy to have an opportunity to participate in a small way in the ongoing dialog about what technology means for humanity. Crap. Now I sound like an artist.

Incidentally, this means that and are the first ever Ruby On Rails apps to be included in a major art exhibition. I already told DHH.

Anyway, at RailsConf Europe a few weeks ago, Dave Thomas' keynote speech emphasized the role of software designers as artists. He said, "treat your projects as though they are artworks, and sign your name to them." Or pretty close to it. I think this is incredibly valuable advice for software designers today.

We're past the days of using machines as amplifiers of our physical efforts. It's not enough to jam more features into code just so we can eliminate one more position on the assembly line. We're at a point where the machines can help amplify our imaginations.

Today, creativity and imagination (what some folks are calling the right brain) are becoming the key drivers of software and design. With imagination, we can see around the corners of today's most pressing challenges. While technical skill is certainly valuable, if it's applied to the wrong problems, it's wasted effort.

Creativity, imagination, and artistry help us identify the areas where we should put our efforts. They help us see things in new ways.

Everywhere I turn (perhaps partly because I am a Rubyist), I hear discussions of Domain Specific Languages, and of framing our problems in the right grammars.

This is hugely valuable because the creative part of our brain thinks in terms of semantics, grammars, and symbols. If we can't get the words right, our imaginations can't engage.

Everything stays stuck in the left side of our brains when we have to jump through hoops to please some particular language or development environment.

I hope you all will come out to see Design and the Elastic Mind when it opens at NYC MoMA, Feb 24 - May 12 2008. I'm not sure how we're going to present the sites but we're going to see if we can get some partners and sponsors involved to do something really beautiful.

And again, thanks to MoMA for the selection. And here's to creativity, imagination, and artistry as the next big thing in software design!

Adhearsion is Moving Forward in a Big Way!

Over the next two weeks, Jay Phillips, Chad Fowler, Marcel Molina, Rich Kilmer, Ed Guy, Glenn Dalgliesh and myself are getting together to work on advancing Adhearsion, the open source VoIP technology.

For those of you who don't know about Adhearsion, it brings a simple, elegant grammar to the world of VoIP. It's an object-oriented DSL (domain specific language) written in Ruby. But that's what's going on underneath. Here's what's going on for you, the user:

# This is an example extensions.rb file which
# would handle how calls are processed by
# Asterisk. This is all completely valid Ruby
internal {
case extension
when 100...200
callee = User.find_by_extension extension
unless callee.busy? then dial callee
voicemail extension

when 111 then exec :meetme

when 888
play weather_report('Dallas, Texas')

when 999
play %w(a-connect-charge-of 22
cents-per-minute will-apply)
sleep 2.seconds
play 'just-kidding-not-upset'

Obviously this is much more palatable than what you might find in your average asterisk extensions.conf file.

Chad, Marcel, and Rich are some of the biggest names in the Ruby & Rails communities. Ed Guy is a legend in Open Source telephony. Jay is the originator of Adhearsion. Glenn, Ed, Jay, and I all work together for the project's sponsor, Truphone. There is some thought that with all of us on the job, Adhearsion might just become the next big thing to come out of the Ruby community.

We'll see about that; it could certainly happen. One thing that is for sure though is that our efforts should bring a level of beauty and clarity heretofore unrealized in the VoIP/telephony/collaboration world, and that certainly is a good thing.

My Wife Is Julie, the 1974 American Girl Historical Character

We were on vacation in San Francisco in July 2005 when my wife was asked to pose hanging off a cable car by some photographers from American Girl.

In New York yesterday, we took our daughter to the American Girl store there and were greeted with these giant 7' tall posters. Jennifer immediately remembered the incident in San Francisco. The staff was amused and gave us a free poster. And I'm amused that she's the 1974 character who gets to say things like "Far Out."

She looked up the illustrator, Robert Hunt, online. He's apparently a major illustrator in the business, having done the artwork for the Dreamworks logo (kid sitting on the moon) as well as a bunch of other major work. Anyway, he described his process very thoroughly, and it seems unlikely that her "likeness" was used, as that would have required a model release, etc.

The team taking the photos was so emphatic (watch for it, it'll be you!) though and the overall likeness to the pose that day is so great that we think those shots were used for blocking out the design.

We'll probably never know, but these little coincidences add a touch of magic to life.

Thursday, September 20, 2007

Sigh... Blogging: Beginning of the End?

Few people will likely read this post, as it's the first in my blog.

I want to say, first, that I am opposed to the idea of blogs on principle. Not because I don't want people to express themselves, or that I am any kind of luddite. What I fear is a universe of commentators and navel-gazers who are so busy pontificating, prognosticating, pointing fingers and engineering calculated praise that they don't have time to create anything original or real.

Taken to its logical conclusion, a world of bloggers is largely a world of watchers, not a world of do-ers. A world of masturbatory, self-congratulatory noise, with so little signal remaining to be undetectable. A veritable echo-chamber of idiocy. A post-apocalyptic world of film critics.

So, with some reservations and resentment, this is the world I now join. Hi, Blogosphere. I hate your name. It is ersatz, self-important, and pretentious. It's representative of our failures as a society. We are so decadent and self-absorbed that we have actually given a name to our latent, reactionary rants. And not a very good one. Blog is bad enough. Blogosphere? C'mon. If it was the seventies we'd be calling it the "Wide World of Amateur Halfwits."

I'm not saying Blogging portends the fall of human civilization. On the whole prime time television has been leading the way there for at least 50 years now.

So why do it at all? In fairness, I felt like I had to. Where I roll, information is everything, and information is interconnected. The currency of this world is attention. When people want to pay it to you, you need to give them a signal to follow. In practice, that signal is best produced in the form of a blog. Having a blog is like having a bank account in the attention economy. You can't get paid with out it.

And thus my blog was born. I consider myself a contemplative person. If I am going to have something as preposterous as a blog, you can expect that I will put some effort into it.

So there's a flaw in my logic. If blogs are all noise and no signal, then why bother trying to make it good? Because I care about what I do and say. And perhaps I should allow for the fact that others do too.

Damn it, blogosphere. Forgive me. And as for you, the jury is still out. History will tell us if blogging was the beginning of the end of civilization.