hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Indexed Hashtables
Date Thu, 15 Jan 2009 11:31:39 GMT
Sean Shanny wrote:
> Delip,
> 
> So far we have had pretty good luck with memcached.  We are building a 
> hadoop based solution for data warehouse ETL on XML based log files that 
> represent click stream data on steroids.
> 
> We process about 34 million records or about 70 GB data a day.  We have 
> to process dimensional data in our warehouse and then load the surrogate 
> <key><value> pairs in memcached so we can traverse the XML files once 
> again to perform the substitutions.  We are using the memcached solution 
> because is scales out just like hadoop.  We will have code that allows 
> us to fall back to the DB if the memcached lookup fails but that should 
> not happen to often.
> 

LinkedIn have just opened up something they run internally, Project 
Voldemort:

http://highscalability.com/product-project-voldemort-distributed-database
http://project-voldemort.com/

It's a DHT, Java based. I haven't played with it yet, but it looks like 
a good part of the portfolio.


Mime
View raw message