by Sean Cribbs
Last week, a couple of unrelated sites I’ve worked on converged on the same problem: slow loading of data from RSS feeds. In one case, the site at times refused to load a feed at all, then at other times added a whole second to the page load time! The culprit, of course was
feed_tools. While a very versatile and flexible library at parsing even ill-formed feeds,
feed_tools is notorious for poor performance and huge memory usage. I was inspired by Charlie Savage’s resurrection of
libxml-ruby to do the same for
feed_tools, partly because it’s an old library (started in 2005) and because it would be an excuse to use the new hotness that is
To get a sense of my progression on this, follow my Twitter page. Here’s a few:
If FeedTools were to have a mental disorder, it would definitely be paranoid schizophrenia. Let’s hope I can be an anti-psychotic.
FeedTools’ largest problem is a lack of DRY (and poor Ruby style). The real intention of the code is lost in the repetition.
FeedTools uses ObjectSpace. EPIC FAIL
If we’re going to give
feed_tools a clean conscience, let’s first confess its sins. And believe me, they are many!
One of the things that immediately jumped out to me was the use of
ObjectSpace. Any experienced Rubyist knows that
ObjectSpace is a dangerous thing to play with and should not be used in performance-critical situations.
feed_tools ignores that precept and uses it to find the parent feed of an individual item, on the premise that it will help with cleaning up dangling objects during garbage collection (which seems wrong to me):
To add insult to injury, the result is not memoized, so every time the
FeedItem object has to access its parent feed, you have to go through the
ObjectSpace loop again! This is one place where I feel it should not be optimizing for memory over speed when the solution is very simple — set the parent feed on initialization or when adding to an existing feed.
One thing that programmers new to Ruby don’t often grasp immediately is that
false are equivalent in conditional expressions and anything else (i.e. an object) is equivalent to
true. This makes many typical scenarios where one might want to test if a method call returned a value, or returning anyone of several possible values much easier and clearer.
feed_tools is littered with a paranoia about
nil values. Here’s just one example:
Those last three
if statements would obviously be more succinctly and clearly said like so (additionally without the ‘self’ fetish):
Much cleaner! The original code is full of things like this.
Just like the obsession with
#nil?, FeedTools is obsessed with
rescue blocks. There is nothing inherently wrong with
begin...rescue...end blocks, however FeedTools seems to use them willy-nilly and without specificity. An example:
The problem with the
rescue this code snippet is that it reflects a lackadaisacal attitude toward what exception is thrown and by what statement.
This is just a preview of the greater architectural problems that plague FeedTools — inconsistent interfaces, broken encapsulation, misplaced responsibility, lack of division of labor, reified utility modules, an incredible amount of repetition, and a general ignorance of Ruby style and convention. If I went into all the details, this would be a much longer article than I want. So let’s pull out our Jump to Conclusions Mat and take a leap.
As you may have found from the links above, I’ve created a github project for my refactoring of the library. You’ll want to pay attention to the ‘libxml’ branch where I’m doing most of the work. Here are my goals:
- Decouple the XML-parsing framework from the feed-parsing, abstracting out the differences between
libxml-ruby, Hpricot, and REXML.
- Use meta-programming and good Ruby style to simplify and clarify the code.
- Separate responsibilities into appropriate modules and classes.
- Maintain a substantial amount of backwards-compatibility with FeedTools 0.2.x, with the exception of the internal API.
- Maintain the ability to recognize and parse any feeds that FeedTools currently recognizes, using the existing test suite.
- Improve the test suite by adding more focused tests on individual components.
- Improve performance and reduce memory consumption using real numbers from
ruby-profand other appropriate tools.
Obviously the first three goals are the most significant on my list. I’d appreciate any feedback you can give me, either on the github project or via email.