Resurrecting feed_tools
by Sean Cribbs
Last week, a couple of unrelated sites I’ve worked on converged on the same problem: slow loading of data from RSS feeds. In one case, the site at times refused to load a feed at all, then at other times added a whole second to the page load time! The culprit, of course was feed_tools
. While a very versatile and flexible library at parsing even ill-formed feeds, feed_tools
is notorious for poor performance and huge memory usage. I was inspired by Charlie Savage’s resurrection of libxml-ruby
to do the same for feed_tools
, partly because it’s an old library (started in 2005) and because it would be an excuse to use the new hotness that is libxml-ruby
.
To get a sense of my progression on this, follow my Twitter page. Here’s a few:
If FeedTools were to have a mental disorder, it would definitely be paranoid schizophrenia. Let’s hope I can be an anti-psychotic.
FeedTools’ largest problem is a lack of DRY (and poor Ruby style). The real intention of the code is lost in the repetition.
FeedTools uses ObjectSpace. EPIC FAIL
Confessions
If we’re going to give feed_tools
a clean conscience, let’s first confess its sins. And believe me, they are many!
ObjectSpace
One of the things that immediately jumped out to me was the use of ObjectSpace
. Any experienced Rubyist knows that ObjectSpace
is a dangerous thing to play with and should not be used in performance-critical situations. feed_tools
ignores that precept and uses it to find the parent feed of an individual item, on the premise that it will help with cleaning up dangling objects during garbage collection (which seems wrong to me):
To add insult to injury, the result is not memoized, so every time the FeedItem
object has to access its parent feed, you have to go through the ObjectSpace
loop again! This is one place where I feel it should not be optimizing for memory over speed when the solution is very simple — set the parent feed on initialization or when adding to an existing feed.
!object#nil?
vs. object
One thing that programmers new to Ruby don’t often grasp immediately is that nil
and false
are equivalent in conditional expressions and anything else (i.e. an object) is equivalent to true
. This makes many typical scenarios where one might want to test if a method call returned a value, or returning anyone of several possible values much easier and clearer. feed_tools
is littered with a paranoia about nil
values. Here’s just one example:
Those last three if
statements would obviously be more succinctly and clearly said like so (additionally without the ‘self’ fetish):
Much cleaner! The original code is full of things like this.
Exception paranoia
Just like the obsession with #nil?
, FeedTools is obsessed with rescue
blocks. There is nothing inherently wrong with begin...rescue...end
blocks, however FeedTools seems to use them willy-nilly and without specificity. An example:
The problem with the rescue
this code snippet is that it reflects a lackadaisacal attitude toward what exception is thrown and by what statement.
This is just a preview of the greater architectural problems that plague FeedTools — inconsistent interfaces, broken encapsulation, misplaced responsibility, lack of division of labor, reified utility modules, an incredible amount of repetition, and a general ignorance of Ruby style and convention. If I went into all the details, this would be a much longer article than I want. So let’s pull out our Jump to Conclusions Mat and take a leap.
Going forward
As you may have found from the links above, I’ve created a github project for my refactoring of the library. You’ll want to pay attention to the ‘libxml’ branch where I’m doing most of the work. Here are my goals:
- Decouple the XML-parsing framework from the feed-parsing, abstracting out the differences between
libxml-ruby
, Hpricot, and REXML. - Use meta-programming and good Ruby style to simplify and clarify the code.
- Separate responsibilities into appropriate modules and classes.
- Maintain a substantial amount of backwards-compatibility with FeedTools 0.2.x, with the exception of the internal API.
- Maintain the ability to recognize and parse any feeds that FeedTools currently recognizes, using the existing test suite.
- Improve the test suite by adding more focused tests on individual components.
- Improve performance and reduce memory consumption using real numbers from
ruby-prof
and other appropriate tools.
Obviously the first three goals are the most significant on my list. I’d appreciate any feedback you can give me, either on the github project or via email.