incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Wintle <timwin...@gmail.com>
Subject Re: Help for creating a custom partitioner
Date Fri, 28 Sep 2012 16:29:00 GMT
On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote:
> Hi,****
> 
> ** **
> 
> I have hierarchical data.****
> 
> I'm storing them in CF with rowkey somewhat like (category, doc id), and
> plenty of columns for a doc definition.****
> 
> ** **
> 
> I have hierarchical data traversal too.****
> 
> The user just chooses one category, and then, interact with docs belonging
> only to this category.****
> 
> ** **
> 
> 1) If I use RandomPartitioner, all docs could be spread within all nodes in
> the cluster => bad performance.****
> 
> ** **
> 
> 2) Using RandomPartitioner, an alternative design could be rowkey=category
> and column name=(doc id, prop name)****
> 
> I don't want it because I need fixed column names for indexing purposes,
> and the "category" is quite a lonnnng string.****
> 
> ** **
> 
> 3) Then, I want to define a new partitioner for my rowkey (category, doc
> id), doing MD5 only for the "category" part.****
> 
> ** **
> 
> The question is : with such partitioner, many rows on *one* node are going
> to have the same MD5 value, as a result of this new partitioner.****

If you do decide writing having rows on the same node is what you want,
then you could take the higher bits of the hash from hashing the
category, and the lower bits of the hash from hashing the document id.

That would mean documents in a category would be close to each other in
the ring - while being unlikely to share the same hash.


However, If you're doing this then all reads/writes to the category are
going to be to a single machine. That's not going to spread the load
across the cluster very well as I assume a few categories are going to
be far more popular than others.

Have you tested that you actually get bad performance from
RandomPartitioner?

Tim 


Mime
View raw message