Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 35553 invoked from network); 10 Nov 2010 14:11:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Nov 2010 14:11:49 -0000 Received: (qmail 97811 invoked by uid 500); 10 Nov 2010 14:12:18 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97607 invoked by uid 500); 10 Nov 2010 14:12:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97599 invoked by uid 99); 10 Nov 2010 14:12:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Nov 2010 14:12:13 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Nov 2010 14:12:04 +0000 Received: by yxh35 with SMTP id 35so3920yxh.31 for ; Wed, 10 Nov 2010 06:11:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.151.101.17 with SMTP id d17mr11763939ybm.77.1289398299082; Wed, 10 Nov 2010 06:11:39 -0800 (PST) Sender: scode@scode.org Received: by 10.151.27.11 with HTTP; Wed, 10 Nov 2010 06:11:38 -0800 (PST) X-Originating-IP: [90.236.78.204] In-Reply-To: <201011101435376177031@ihep.ac.cn> References: <201011101435376177031@ihep.ac.cn> Date: Wed, 10 Nov 2010 15:11:38 +0100 X-Google-Sender-Auth: sZIuMFaE6fK9sd9OGclIHSiPfe0 Message-ID: Subject: Re: about key sorting and token partitioning From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 > I am using cassandra to store a message steam, and want to use timestamps > (like yyyymmddhhMIss or something alike) as the keys. > So if I use RandomPartitioner, I will loose the order when using > get_range_slices(). > If I use OrderPreservingPartitioner, how should I configure cassandra to > make load balance between the nodes? AFAIK there's no silver bullet to making the order preserving partitioner easy to use w.r.t. node balancing in the situation you're describing. One thing to consider is to use the random partitioner (for its simplicity in managing the cluster) and use a granular subset of the timestamp as the row key. For example, you could have the row key be yyyymmddhh to get an entire hour per row. A reasonable granularity would depend on your use-case; but the idea is to be able to take advantage of the simplicity of using the random partitioner, while having reasonable efficiency on range slices by making each row contain a pretty large range such that any additional overhead in jumping across nodes is negligible in comparison to the other work done. -- / Peter Schuller