cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2878) Allow CQL-based map/reduce
Date Sat, 07 Jan 2012 06:49:39 GMT


Jonathan Ellis commented on CASSANDRA-2878:

There's one wrinkle with doing M/R over CQL -- we need to split the input space up into token-delineated
ranges, since key order may not be partitioner order.

I see a few options:
# Add a "private" CQL thrift method that takes token ranges as well as the query string
# Add some kind of syntax to CQL to support query-by-token, e.g., "WHERE token(user_id) >=
2300183742897592" [here user_id is the key alias]
# Parse the CQL query in CqlRecordReader and turn it into a Thrift get_range_slices call (which
is similar to, but can't share much code with, QueryProcessor turning CQL queries into StorageProxy
# Drop the idea of adding a CqlInputFormat and just add configuration parameters for KeyRange
to ColumnFamilyInputFormat

None of these are awesome.  4 is probably the most straightforward, but leaves us SOL for
wide rows, while a CQL inputformat can solve that as well (CASSANDRA-2474).  3 has the same
problem of not generalizing to 2474.  2 feels cleanest in some ways, but I've never been thrilled
with adding query-by-token to thrift either since it lends itself to abuse (CASSANDRA-1978).
 Which brings us back to 1, but then we're stuck supporting that "hack" post-Thrift as well

> Allow CQL-based map/reduce
> --------------------------
>                 Key: CASSANDRA-2878
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>            Reporter: Mck SembWever
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.1
> Currently, when running a MapReduce job against data in a Cassandra data store, it reads
through all the data for a particular ColumnFamily.  This could be optimized to only read
through those rows that have to do with the query.
> Adding CQL support to m/r will allow using an index more simply than trying to cram support
for more parameters into the job configuration.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message