hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Distributed Updateable Cache
Date Fri, 23 Jul 2010 03:35:40 GMT
HBase? Memcached?

On Jul 22, 2010, at 4:56 AM, Vitaliy Semochkin wrote:

> Hi,
>
> I need to do calculations that would benefit from storing  
> information in
> distributed updateable cache.
> What are best practices for such things in hadoop?
>
> PS
> In case there is no good solution for my problem, here are details  
> and ideas
> I have.
> I'm going to count unique visitors of a site several times per  
> day(each 5
> mins), for that I will need distributed cache that will be  
> accessible from
> all mappers to store already counted visitors.
>
> My plan is:
> store unique visitors in a file on hdfs
> each time mapper jvm starts  store in HashSet in each jvm (I
> use mapred.job.reuse.jvm.num.tasks=-1)
> after each map/reduce job add additional data to this file
>
> any critics and advises are welcome :-)
>
> Regards,
> Vitaliy S


Mime
View raw message