incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Ilinykh <ailin...@gmail.com>
Subject Re: Why data is not even distributed.
Date Thu, 04 Oct 2012 21:33:44 GMT
It was my first thought.
Then I md5 uuid and used the digest as a key:

MessageDigest md = MessageDigest.getInstance("MD5");

//in the loop
UUID uuid = UUID.randomUUID();
byte[] bytes = md.digest(asByteArray(uuid));

the result is exactly the same, first node takes 66%, second 33% and
third one is empty. for some reason rows which should be placed on
third node moved to first one.

Address         DC          Rack        Status State   Load
Effective-Ownership Token


Token(bytes[56713727820156410577229101238628035242])
127.0.0.1       datacenter1 rack1       Up     Normal  7.68 MB
33.33%              Token(bytes[00])
127.0.0.3       datacenter1 rack1       Up     Normal  79.17 KB
33.33%
Token(bytes[0113427455640312821154458202477256070485])
127.0.0.2       datacenter1 rack1       Up     Normal  3.81 MB
33.33%
Token(bytes[56713727820156410577229101238628035242])



On Thu, Oct 4, 2012 at 12:33 AM, Tom <fivemiletom@gmail.com> wrote:
> Hi Andrey,
>
> while the data values you generated might be following a true random
> distribution, your row key, UUID, is not (because it is created on the same
> machines by the same software with a certain window of time)
>
> For example, if you were using the UUID class in Java, these would be
> composed from several components (related to dimensions such as time and
> version), so you can not expect a random distribution over the whole space.
>
>
> Cheers
> Tom
>
>
>
>
> On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh <ailinykh@gmail.com> wrote:
>>
>> Hello, everybody!
>>
>> I'm observing very strange behavior. I have 3 node cluster with
>> ByteOrderPartitioner. (I run 1.1.5)
>> I created a key space with replication factor of 1.
>> Then I created one column family and populated it with random data.
>> I use UUID as a row key, and Integer as a column name.
>> Row keys were generated as
>>
>> UUID uuid = UUID.randomUUID();
>>
>> I populated about 100000 rows with 100 column each.
>>
>> I would expect equal load on each node, but the result is totally
>> different. This is what nodetool gives me:
>>
>> Address         DC          Rack        Status State   Load
>> Effective-Ownership Token
>>
>>
>> Token(bytes[56713727820156410577229101238628035242])
>> 127.0.0.1       datacenter1 rack1       Up     Normal  27.61 MB
>> 33.33%              Token(bytes[00])
>> 127.0.0.3       datacenter1 rack1       Up     Normal  206.47 KB
>> 33.33%
>> Token(bytes[0113427455640312821154458202477256070485])
>> 127.0.0.2       datacenter1 rack1       Up     Normal  13.86 MB
>> 33.33%
>> Token(bytes[56713727820156410577229101238628035242])
>>
>>
>> one node (127.0.0.3) is almost empty.
>> Any ideas what is wrong?
>>
>>
>> Thank you,
>>   Andrey
>
>

Mime
View raw message