incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Kennedy <>
Subject Re: map reduce job over indexed range of keys
Date Fri, 25 Feb 2011 00:45:52 GMT
Right, so I'm interpreting silence as a confirmation on all points. I

to work on these.

On Wed, Feb 23, 2011 at 5:31 PM, Matt Kennedy <> wrote:

> Let me start out by saying that I think I'm going to have to write a patch
> to get what I want, but I'm fine with that.  I just wanted to check here
> first to make sure that I'm not missing something obvious.
> I'd like to be able to run a MapReduce job that takes a value in an indexed
> column as a parameter, and use that to select the data that the MapReduce
> job operates on.  Right now, it looks like this isn't possible because
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data
> with get_range_slices, not get_indexed_slices.
> An example might be useful.  Let's say I want to run a map reduce job over
> all the data for a particular country.  Right now I can do this in Map
> Reduce by simply discarding all the data that is not from the country I want
> to process on. I suspect it will be faster if I can reduce the size of the
> Map Reduce job by only selecting the data I want by using secondary indexes
> in Cassandra.
> So, first question: Am I wrong?  Is there some clever way to enable the
> behavior I'm looking for (without modifying the cassandra codebase)?
> Second question: If I'm not wrong, should I open a JIRA issue for this and
> start coding up this feature?
> Finally, the real reason that I want to get this working is so that I can
> enhance the CassandraStorage pig loadfunc so that it can take query
> parameters on in the URL string that is used to specify the keyspace and
> column family.  So for example, you might load data into Pig with this
> sytax:
> rows = LOAD 'cassandra://mykeyspace/mycolumnfamily?country=UK' using
> CassandraStorage();
> I'd like to get some feedback on that syntax.
> Thanks,
> Matt Kennedy

View raw message