hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Koch <tho...@koch.ro>
Subject Re: feed queue fetcher with hadoop/zookeeper/gearman?
Date Mon, 12 Apr 2010 17:49:01 GMT
Mahadev Konar:
> Hi Thomas,
>   There are a couple of projects inside Yahoo! that use ZooKeeper as an
> event manager for feed processing.
> 
> I am little bit unclear on your example below. As I understand it-
> 
> 1. There are 1 million feeds that will be stored in Hbase.
> 2. A map reduce job will be run on these feeds to find out which feeds need
> to be fetched.
> 3. This will create queues in ZooKeeper to fetch the feeds
> 4.  Workers will pull items from this queue and process feeds
> 
> Did I understand it correctly? Also, if above is the case, how many queue
> items would you anticipate be accumulated every hour?
Yes. That's exactly what I'm thinking about. Currently one node processes like 
20000 Feeds an hour and we have 5 feed-fetch-nodes. This would mean ~100000 
queue items/hour. Each queue item should carry some meta informations, most 
important the feed items, that are already known to the system so that only 
new items get processed.

Thomas Koch, http://www.koch.ro

Mime
View raw message