Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 36525 invoked from network); 27 Mar 2010 09:26:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Mar 2010 09:26:31 -0000 Received: (qmail 46287 invoked by uid 500); 27 Mar 2010 09:26:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 46078 invoked by uid 500); 27 Mar 2010 09:26:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 46070 invoked by uid 99); 27 Mar 2010 09:26:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Mar 2010 09:26:30 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of roland237@googlemail.com designates 209.85.220.216 as permitted sender) Received: from [209.85.220.216] (HELO mail-fx0-f216.google.com) (209.85.220.216) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Mar 2010 09:26:24 +0000 Received: by fxm8 with SMTP id 8so5869373fxm.25 for ; Sat, 27 Mar 2010 02:26:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:received :in-reply-to:references:date:x-google-sender-auth:received :message-id:subject:from:to:content-type; bh=xe5wYuCWbP8nwnSiDZUiCSIjBDpm75Asb+aPBXjEWIs=; b=UwdNp/x0c/loWmU7WsUb/94odGcREr44OV5BZYn6qW6Jkrh0wtpPctmVF2el18AwQs cYbz881Jvlz5WWffEJ2WLXj/lyuP6dL4NPMhC8z/nR5grMYMSOb4k7g481ZS6OSCVl9P ScLB6bK03mo0ID1VYTtI2QKMcb1dholBVf0cU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=XaiC1ueRyHU5yq6u49ijTd6eZ4cAJ0E3ODLCPVmoKbyxaW9A8D8b4+x+PZ8X9HTtoM 6rqqT/DlBbi+vRmrMnadlSYrM/Ffpz3mgZI/Rm0BJ8IkO8dLvoFYpWmwTTKsoMQsO9V2 ldotM0/ZfX19EOkGz6bAJlUxhSpKVa4dWQQio= MIME-Version: 1.0 Sender: roland237@googlemail.com Received: by 10.103.241.11 with HTTP; Sat, 27 Mar 2010 02:26:03 -0700 (PDT) Received: by 10.103.241.11 with HTTP; Sat, 27 Mar 2010 02:26:03 -0700 (PDT) In-Reply-To: <10e230a81003261435r4ec831dfy41bded43cee8ae8e@mail.gmail.com> References: <33FDEB0CE2F65F41A4CF8769247BB3668DC58A2A5C@EXVMBX016-3.exch016.msoutlookonline.net> <10e230a81003251117n681650bas7877aeb7170b6c7a@mail.gmail.com> <2545a92c1003251152h1be16180yc93e649a5ea8a91@mail.gmail.com> <429591151003261336j29fc6236r915e7e90d8d665c0@mail.gmail.com> <10e230a81003261435r4ec831dfy41bded43cee8ae8e@mail.gmail.com> Date: Sat, 27 Mar 2010 10:26:03 +0100 X-Google-Sender-Auth: 1334e3d9b29bdaa6 Received: by 10.103.125.13 with SMTP id c13mr1106363mun.81.1269681963958; Sat, 27 Mar 2010 02:26:03 -0700 (PDT) Message-ID: <429591151003270226g1034853fxa4dd097bfad5cc0a@mail.gmail.com> Subject: Re: Ring management and load balance From: =?ISO-8859-1?Q?Roland_H=E4nel?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e65b60a270821d0482c4ded7 X-Virus-Checked: Checked by ClamAV on apache.org --0016e65b60a270821d0482c4ded7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Mike, If you have the assumption that your rows are roughly equal in size (at least statistcally), then you could also just take a node's total load (thi= s is exposed via Jmx) and divide by the amount of keys/rows on that node. Not sure how to get the latter, but shouldn't be such a big deal to integrate i= n JMX if not already there. Roland 26.03.2010 22:36 schrieb am "Mike Malone" : 2010/3/26 Roland H=E4nel > > Jonathan, > > I agree with your idea about a tool that could 'propose' good token choices for op... With the random partitioner there's no need to suggest a token. The key space is statistically random so you should be able to just split 2^128 int= o equal sized segments and get fairly equal storage load. Your read / write load could get out of whack if you have hot spots and stuff, I guess. But for a large distributed data set I think that's unlikely. For order preserving partitioners it's harder. We've been thinking about this issue at SimpleGeo and were planning on implementing an algorithm that could determine the median row key statistically without having to inspect every key. Basically, it would pull a random sample of row keys (maybe from the Index file?) and then determine the median of that sample. Thoughts? Mike --0016e65b60a270821d0482c4ded7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Mike,

If you have the assumption that your rows are roughly equal in size (at = least statistcally), then you could also just take a node's total load = (this is exposed via Jmx) and divide by the amount of keys/rows on that nod= e. Not sure how to get the latter, but shouldn't be such a big deal to = integrate in JMX if not already there.

Roland

26.03.2010 22:36 schrieb am "Mike Malone&= quot; <mike@simplegeo.com>:=

2010/3/26 Roland H=E4nel <roland@haenel.me>

>
> Jonathan= ,
>
> I agree with your idea about a tool that could 'propo= se' good token choices for op...

With the random partiti= oner there's no need to suggest a token. The key space is statistically= random so you should be able to just split 2^128 into equal sized segments= and get fairly equal storage load. Your read / write load could get out of= whack if you have hot spots and stuff, I guess. But for a large distribute= d data set I think that's unlikely.

For order preserving partitioners it's harder. We&#= 39;ve been thinking about this issue at SimpleGeo and were planning on impl= ementing an algorithm that could determine the median row key statistically= without having to inspect every key. Basically, it would pull a random sam= ple of row keys (maybe from the Index file?) and then determine the median = of that sample. Thoughts?

Mike

--0016e65b60a270821d0482c4ded7--