Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 65972 invoked from network); 26 Apr 2010 10:17:53 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Apr 2010 10:17:53 -0000 Received: (qmail 61603 invoked by uid 500); 26 Apr 2010 10:17:51 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61551 invoked by uid 500); 26 Apr 2010 10:17:51 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61543 invoked by uid 99); 26 Apr 2010 10:17:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 10:17:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of markxr@gmail.com designates 209.85.218.222 as permitted sender) Received: from [209.85.218.222] (HELO mail-bw0-f222.google.com) (209.85.218.222) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 10:17:42 +0000 Received: by bwz22 with SMTP id 22so10947961bwz.25 for ; Mon, 26 Apr 2010 03:17:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=7S8hn3haXtJLp5Pn5qYdoySvywAzJSwPX3CmAE5XtS4=; b=MNKgmPg3eK62sr+sWniKVGCAtoghpHlg6azLtx2V9T9zK+ePC193QAYC66zMUezswz QbVePdKA3xeLDRjuwNMTJke+OZK3+CNRpBVUfx5zsTzv7K5G1jfbaKoWtorzhXZNSxbI COo/BJPWUB916gm6TG4e3/9EXUdUc4y6L5WU4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=OiGOJNxNTgRR7i9BzwDAsUnZiUFuXcShr3tFMQn4x5+4TTSPUdPkxZk28cKqkI7PyW u4T4utYtSmONUE8PtkrgTn5TZd8ClRTDWHnxQnOpCSi6KWcrG4YQ9ZkRcC26KEne8NGu uryYw1JPeJhDf4ajsjBja8/T44WdDy+lNjYK8= MIME-Version: 1.0 Received: by 10.204.22.75 with SMTP id m11mr2417035bkb.51.1272277042448; Mon, 26 Apr 2010 03:17:22 -0700 (PDT) Received: by 10.204.61.8 with HTTP; Mon, 26 Apr 2010 03:17:22 -0700 (PDT) In-Reply-To: References: Date: Mon, 26 Apr 2010 10:17:22 +0000 Message-ID: Subject: Re: when i use the OrderPreservingPartition, the load is very imbalance From: Mark Robson To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00032555404a2c414e0485211560 X-Virus-Checked: Checked by ClamAV on apache.org --00032555404a2c414e0485211560 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 26 April 2010 01:18, =E5=88=98=E5=85=B5=E5=85=B5 wro= te: > i do some INSERT ,because i will do some scan operations, i use the > OrderPreservingPartition method. > > the state of the cluster is showed below. > > as i predicated the load is very imbalance I think the solution to this would be to choose your nodes' tokens wisely before you start inserting data, and if possible, modify the keys to split them better between the nodes. For example, if your key has two parts, one of which you want to range scan= , another which you don't. Say you have customer_id and a timestamp. The customer ID does not need to be range scanned, so you can hash it into a he= x value (say), then append the timestamp (in a lexically sortable way of course). So you'd end up with keys like HHHH-0012345-0001234567890 Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and th= e rest is a timestamp. You'd be able to do a time range scan by using the known prefixes, and distributing your nodes equally from 0000 to ffff would result in fairly even data (provided you don't have a very small number of very large customers). Mark --00032555404a2c414e0485211560 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On 26 April 2010 01:18, =E5=88=98=E5=85=B5=E5=85= =B5 <rucbing@gmai= l.com> wrote:
i do some INSERT ,because i will do some scan operations, i use the OrderPr= eservingPartition method.

the state of the cluster is showed below.<= br>
as i predicated the load is very imbalance


I think the solution to this would be to choose y= our nodes' tokens wisely before you start inserting data, and if possib= le, modify the keys to split them better between the nodes.

For example, if your key has two parts, one of which you want to= range scan, another which you don't. Say you have customer_id and a ti= mestamp. The customer ID does not need to be range scanned, so you can hash= it into a hex value (say), then append the timestamp (in a lexically sorta= ble way of course). So you'd end up with keys like=C2=A0

HHHH-0012345-0001234567890

Whe= re HHHH is a hash of the customer ID,=C2=A00012345 is the customer ID, and = the rest is a timestamp.

You'd be able to do a= time range scan by using the known prefixes, and distributing your nodes e= qually from 0000 to ffff would result in fairly even data (provided you don= 't have a very small number of very large customers).

Mark
--00032555404a2c414e0485211560--