Unclogging RSS Bandwidth

Wired's article Will RSS Readers Clog the Web? starts off with some interesting points about the bandwidth problems with aggregators, but I don't really agree with some of their conclusions.

Like stated in the article, it's definitely important for aggregators to support time-stamp checking to make sure they don't keep grabbing the same feed over and over. Of course the flip-side is it's also important for servers to be able to provide this information. Pyblosxom (which powers this blog) doesn't seem to have this ability.

One of my problems with the article is that they don't mention that grabbing an RSS feed is most likely going to consume less bandwidth than loading the page in a browser.

Whereas a human reader may scan headlines on The New York Times website once a day, aggregators check the site hourly or even more frequently.

While this may be true for some people, I imagine that these are not the people who are using aggregators. Before using a feed reader I would check sites like Slashdot repeatedly during the day for updates. Using an aggregator I can check the site less frequently now since I only need to visit it when I see an article I'm interested in.

I decided to do a little math to figure out how the bandwidth compares between loading the RSS and the actual page. Slashdot's RSS feed is weighing in at 1.9K. Now this didn't really seem fair since it's only the titles. Adding in the story bodies shown on the front page brings it up to 9.1K. Updating once an hour gives us 218.4K per day.

So how does that compare to reading the front page. Well, to be conservative I'll assume that your browser has cached all the images and is only loading the HTML, which comes to ... 63K! So, if I read Slashdot's front page 4 or more times a day I'm using more bandwidth than if I was using an RSS feed that contained the article content. (To be fair Slashdot could really benefit from a CSS-based overhaul, but that's for a different article)

Ok, so I'm just going to assume that there are sites out there where RSS is eating up their bandwidth. Wired (or at least some of the people they talked to) seem to think P2P would help alleviate these problems. I'm not so sure I agree with this solution. While protocols like BitTorrent can reduce the strain of file distribution on a server, they're really best at distributing large files. There is an overhead when distributing files via P2P with distributing the information about what peers are available for each requested file (and what portions of the file they have, etc.). This may not be too bad when you're distributing files that are a few megs or more, but with RSS we're only looking at maybe 10K. I haven't figured out the numbers, but it seems like it would be quite hard to make the overhead of P2P small enough to justify using it for RSS distribution.

Now, I don't understand why Wired has ignored the RSS Cloud specification. This allows an aggregator to subscribe to a feed only once a day. Then, the server sends a notification to the aggregator when new content is published (thus publish/subscribe). This would allow aggregators to only refresh the content when it had been updated, eliminating the redundant updates. I'd also like to see this implemented more since it would also give more timely notification of updates.