hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Bist" <manoj.bi...@gmail.com>
Subject Using Nutch for crawling + storing RSS feeds.
Date Fri, 04 Jan 2008 01:32:42 GMT

I need to build a system that crawls a given set of RSS feed urls
periodically. For each RSS feed, the system needs to maintain a master RSS
feed that contains all the items i.e. even though old items get dropped from
the RSS feed, the master RSS feed contains all the items.

Does something similar to this already exist? I noticed a couple of mail
threads pertaining to this but its not very clear if Nutch is the right
framework for a task like this.  I would really appreciate any
pointers/comments/suggestions regarding this.



Tired of reading blogs? Listen to  your favorite blogs at
http://www.blogbard.com   !!!!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message