hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Shanny <ssha...@tripadvisor.com>
Subject Re: Indexed Hashtables
Date Thu, 15 Jan 2009 03:54:04 GMT

So far we have had pretty good luck with memcached.  We are building a  
hadoop based solution for data warehouse ETL on XML based log files  
that represent click stream data on steroids.

We process about 34 million records or about 70 GB data a day.  We  
have to process dimensional data in our warehouse and then load the  
surrogate <key><value> pairs in memcached so we can traverse the XML  
files once again to perform the substitutions.  We are using the  
memcached solution because is scales out just like hadoop.  We will  
have code that allows us to fall back to the DB if the memcached  
lookup fails but that should not happen to often.



Sean Shanny

On Jan 14, 2009, at 9:47 PM, Delip Rao wrote:

> Hi,
> I need to lookup a large number of key/value pairs in my map(). Is
> there any indexed hashtable available as a part of Hadoop I/O API?
> I find Hbase an overkill for my application; something on the lines of
> HashStore (www.cellspark.com/hashstore.html) should be fine.
> Thanks,
> Delip

View raw message