Thanks! Very helpful.


On Mon, Dec 3, 2012 at 4:04 PM, aaron morton <aaron@thelastpickle.com> wrote:
For background, you may find the wide row setting useful http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration

AFAIK all the input row readers for Hadoop do range scans. And I think the support for setting the start and end token is used so that jobs only select data which is local to the node. It's not really possible to select individual rows by token.

If you had a secondary index on the row you could use the setInputRange overload that takes an index expression.

Or it may be easier to use hive.

Hope that helps.

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/12/2012, at 3:04 PM, Jamie Rothfeder <jamie.rothfeder@gmail.com> wrote:

> Hey All,
>
> I have a bunch of time-series data stored in a cluster using a ByteOrderedPartitioner. My keys are time buckets representing events that occurred in an hour. I've been trying to write a mapreduce job that considers only events with in a certain time range by specifying an input range, but this doesn't seem to be working.
>
> I expect the following code to scan data for a single key (1353456000), but it is scanning all keys.
>
> int key = 1353456000;
> IPartitioner part = ConfigHelper.getInputPartitioner(job.getConfiguration());
> Token token =  part.getToken(ByteBufferUtil.bytes(key));
> ConfigHelper.setInputRange(job.getConfiguration(), part.getTokenFactory().toString(token), part.getTokenFactory().toString(token));
>
> Any idea what I'm doing wrong?
>
> Thanks,
> Jamie