I've released the first version of Planet Express, a Python script that uses a blog as the public interface for the feed aggregator. Planets are public feed aggregators. For example, you can read Planet Movable Type or Planet Python. The mother of all planets is Planet, a Python script that read a list of feeds and uses templates to generate the front page of the planet. However, I think that using a blog for the front page is more efficient while taking advantage of a full featured CMS's like Movable Type, Blogger or TypePad. Right now, I'm testing the 0.1 release (source code available in BSD license) in Planeta Canarias.
It requires requires Python 2.1. It uses Universal Feed Parser (Mark Pilgrim) and OPML Paser (David Janes). Planet Express reads an OPML file containing a list of feeds. Then, it reads each feed and extracts the new entries. Finally, it publishes the new entries in a blog using metaWeblog API. For each entry, Planet Express saves a unique identifier in express.hsh. This file is readed when parsing new entries to avoid re-publications. Entry titles in the aggregator blog use the format
<blog URL>blog title</blog URL>: <post URL>post Title</post URL>
Blog's URL and title are taken from OPML feed file (htmlUrl and title attributes of outline element).
The remote publishing configuration is stored in express.cfg file, which has the following format:
You will find more information in README and INSTALL files, in the released package. Suggestions welcome.
Similar software is Ben Hammersley's Crossposter for MT and Morbus' MyRSSMerger, the two of then programmed in Perl. Planet Express' initial implementation was done in Perl, but some of the CPAN modules I used had troubles dealing with feed encoding (some Spanish feeds use ISO-8859-1, ISO-8859-15 and other ones, UTF-8). Pilgrim's Feed Parser is a kick ass library and support both RSS (and its many versions) and Atom.