Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E9B9D33F for ; Thu, 4 Oct 2012 07:33:51 +0000 (UTC) Received: (qmail 33985 invoked by uid 500); 4 Oct 2012 07:33:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33481 invoked by uid 500); 4 Oct 2012 07:33:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33456 invoked by uid 99); 4 Oct 2012 07:33:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Oct 2012 07:33:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of fivemiletom@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-ia0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Oct 2012 07:33:38 +0000 Received: by mail-ia0-f172.google.com with SMTP id o25so90845iad.31 for ; Thu, 04 Oct 2012 00:33:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Zo31gJwviRwBMpUEge+3ZLgeuNotOMBkzcMLYJXfYao=; b=AcMo5tcEpj2uQyjKOvC9NYEAJ+uSfZqrPHOaUyz1RRvUzjGp9eh7bc6WhHCk2ZVWYC yppdV9r/wO77xWZfXUDJ5kut5rYfLsAfy0xFhbCZKCsf86wmWA4cpjm9pxDR0YRP9qDh ErbLHi9Sg4p0TJBUgr7CZEOjyqRn99DrxWEAxgyKNUkjzUL/2tGWsecgSCjQcQscpG4x rypQJlzX5acPAWeWhbz0I385Xn7Kqe1sfPpBVTh+Sfp++aD8/ObdWDci10deH3s7SWwZ QPpUf7zeCe3umtNx+uzF49Q2D5tn0ylvB+IHJtjOhGNqjMzqgfTjZoAvNLOTK1HrSrTG PJag== MIME-Version: 1.0 Received: by 10.50.173.7 with SMTP id bg7mr4290258igc.65.1349335997332; Thu, 04 Oct 2012 00:33:17 -0700 (PDT) Received: by 10.42.163.195 with HTTP; Thu, 4 Oct 2012 00:33:17 -0700 (PDT) In-Reply-To: References: Date: Thu, 4 Oct 2012 00:33:17 -0700 Message-ID: Subject: Re: Why data is not even distributed. From: Tom To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=e89a8f838a55cdbadc04cb36c43b X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f838a55cdbadc04cb36c43b Content-Type: text/plain; charset=ISO-8859-1 Hi Andrey, while the data values you generated might be following a true random distribution, your row key, UUID, is not (because it is created on the same machines by the same software with a certain window of time) For example, if you were using the UUID class in Java, these would be composed from several components (related to dimensions such as time and version), so you can not expect a random distribution over the whole space. Cheers Tom On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh wrote: > Hello, everybody! > > I'm observing very strange behavior. I have 3 node cluster with > ByteOrderPartitioner. (I run 1.1.5) > I created a key space with replication factor of 1. > Then I created one column family and populated it with random data. > I use UUID as a row key, and Integer as a column name. > Row keys were generated as > > UUID uuid = UUID.randomUUID(); > > I populated about 100000 rows with 100 column each. > > I would expect equal load on each node, but the result is totally > different. This is what nodetool gives me: > > Address DC Rack Status State Load > Effective-Ownership Token > > > Token(bytes[56713727820156410577229101238628035242]) > 127.0.0.1 datacenter1 rack1 Up Normal 27.61 MB > 33.33% Token(bytes[00]) > 127.0.0.3 datacenter1 rack1 Up Normal 206.47 KB > 33.33% > Token(bytes[0113427455640312821154458202477256070485]) > 127.0.0.2 datacenter1 rack1 Up Normal 13.86 MB > 33.33% > Token(bytes[56713727820156410577229101238628035242]) > > > one node (127.0.0.3) is almost empty. > Any ideas what is wrong? > > > Thank you, > Andrey > --e89a8f838a55cdbadc04cb36c43b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Andrey,

while the data values you generated might be following a = true random distribution, your row key, UUID, is not (because it is created= on the same machines by the same software with a certain window of time)
For example, if you were using the UUID class in Java, these would be c= omposed from several components (related to dimensions such as time and ver= sion), so you can not expect a random distribution over the whole space.

Cheers
Tom



On Wed, Oct= 3, 2012 at 5:39 PM, Andrey Ilinykh <ailinykh@gmail.com> wr= ote:
Hello, everybody!

I'm observing very strange behavior. I have 3 node cluster with
ByteOrderPartitioner. (I run 1.1.5)
I created a key space with replication factor of 1.
Then I created one column family and populated it with random data.
I use UUID as a row key, and Integer as a column name.
Row keys were generated as

UUID uuid =3D UUID.randomUUID();

I populated about 100000 rows with 100 column each.

I would expect equal load on each node, but the result is totally
different. This is what nodetool gives me:

Address =A0 =A0 =A0 =A0 DC =A0 =A0 =A0 =A0 =A0Rack =A0 =A0 =A0 =A0Status St= ate =A0 Load
Effective-Ownership Token


Token(bytes[56713727820156410577229101238628035242])
127.0.0.1 =A0 =A0 =A0 datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A027= .61 MB
33.33% =A0 =A0 =A0 =A0 =A0 =A0 =A0Token(bytes[00])
127.0.0.3 =A0 =A0 =A0 datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A020= 6.47 KB
33.33%
Token(bytes[0113427455640312821154458202477256070485])
127.0.0.2 =A0 =A0 =A0 datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A013= .86 MB
33.33%
Token(bytes[56713727820156410577229101238628035242])


one node (127.0.0.3) is almost empty.
Any ideas what is wrong?


Thank you,
=A0 Andrey

--e89a8f838a55cdbadc04cb36c43b--