cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: Efficient map reduce over ranges of Cassandra data
Date Fri, 11 Nov 2011 16:44:16 GMT
Nice!  Thanks Ed.

On Nov 10, 2011, at 11:20 PM, Edward Capriolo wrote:

> Hey all,
> 
> I know there are several tickets in the pipe that should make it possible do use secondary
indexes to run map reduce jobs that do not have to ingest the entire dataset such as:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-1600
> 
> I had ended up creating a sharded secondary index in user space (I just call it ordered
buckets), described here:
> 
> http://www.slideshare.net/edwardcapriolo/casbase-presentation/27
> 
> Looking at the ordered buckets implementation I realized it is a perfect candidate for
"efficient map reduce" since it is easy to split.
> 
> A unit test of that implementation is here:
> 
> https://github.com/edwardcapriolo/casbase/blob/master/src/test/java/com/jointhegrid/casbase/hadoop/OrderedBucketInputFormatTest.java
> 
> With this you can current do efficient map reduce on cassandra data, while waiting for
other integrated solutions to come along.
> 


Mime
View raw message