Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <201011101435376177031@ihep.ac.cn>
References: <201011101435376177031@ihep.ac.cn>
Date: Wed, 10 Nov 2010 15:11:38 +0100
Message-ID: <AANLkTimsdbdVxzKCxPE=iuySm+4BqV93+r+2jUsAnnug@mail.gmail.com>
Subject: Re: about key sorting and token partitioning
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8

> I am using cassandra to store a message steam, and want to use timestamps
> (like yyyymmddhhMIss or something alike) as the keys.
> So if I use RandomPartitioner, I will loose the order when using
> get_range_slices().
> If I use OrderPreservingPartitioner, how should I configure cassandra to
> make load balance between the nodes?

AFAIK there's no silver bullet to making the order preserving
partitioner easy to use w.r.t. node balancing in the situation you're
describing.

One thing to consider is to use the random partitioner (for its
simplicity in managing the cluster) and use a granular subset of the
timestamp as the row key. For example, you could have the row key be
yyyymmddhh to get an entire hour per row.

A reasonable granularity would depend on your use-case; but the idea
is to be able to take advantage of the simplicity of using the random
partitioner, while having reasonable efficiency on range slices by
making each row contain a pretty large range such that any additional
overhead in jumping across nodes is negligible in comparison to the
other work done.

-- 
/ Peter Schuller