cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-1473) Implement a Cassandra aware Hadoop mapreduce.Partitioner
Date Fri, 12 Aug 2011 00:57:27 GMT


Jonathan Ellis commented on CASSANDRA-1473:

Actually even for RP I don't see how to make this useful.

I was thinking md5(key) % partitions but that's not actually going to group the keys by node
at all.  It's _a_ partitioning but not a _useful_ one. :)

> Implement a Cassandra aware Hadoop mapreduce.Partitioner
> --------------------------------------------------------
>                 Key: CASSANDRA-1473
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Stu Hood
>            Assignee: Patricio Echague
>             Fix For: 1.0
> When using a IPartitioner that does not sort data in byte order (RandomPartitioner for
example) with Cassandra's Hadoop integration, Hadoop is unaware of the output order of the
> We can make Hadoop aware of the proper order of the output data by implementing Hadoop's
mapreduce.Partitioner interface: then Hadoop will handle sorting all of the data according
to Cassandra's IPartitioner, and the writing clients will be able to connect to smaller numbers
of Cassandra nodes.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message