Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08BC2822B for ; Mon, 22 Aug 2011 22:12:04 +0000 (UTC) Received: (qmail 60582 invoked by uid 500); 22 Aug 2011 22:12:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 60503 invoked by uid 500); 22 Aug 2011 22:12:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 60495 invoked by uid 99); 22 Aug 2011 22:12:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Aug 2011 22:12:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a49.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Aug 2011 22:11:55 +0000 Received: from homiemail-a49.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a49.g.dreamhost.com (Postfix) with ESMTP id B73CC5E0056 for ; Mon, 22 Aug 2011 15:11:34 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=HdYbBqU0JDX+k4ebNweTFvsMwV5OthRAg1h7LNazkDu 0FBjmgUniu+V/yTOdu64SpvSkiUdxB6Ef4WcQ3UUwBo3KRPHvrKAHmZ7SEY+2lGG rwmsxDCaxuozhFJ5JpOgtaLpF3kB7weYkgbLByaA2UptmULgpmVLm3a4dx78kZ3k = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=DL9VdS7psuC0zNF4EgwFrBA/lCM=; b=fsN6VpHwoR RqiR5fkoBTgsq23vg97yfJEEW2Ssushg21g7V/tIXKQdrVAoeVrZL2QVl2EBkNOe cXGThPwBySA4dAgLXsaoDjag/MumE/DnM93/fqeLcgAgneI8nPfPfWifNnOwk675 x/oG0l/57vUd9K2F1SISSzkS7SrpfBXVg= Received: from 202-126-206-156.vectorcommunications.net.nz (unknown [202.126.206.156]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a49.g.dreamhost.com (Postfix) with ESMTPSA id 1AB6B5E0057 for ; Mon, 22 Aug 2011 15:11:21 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Apple Message framework v1244.3) Subject: Re: Cluster key distribution wrong after upgrading to 0.8.4 From: aaron morton In-Reply-To: Date: Tue, 23 Aug 2011 10:11:00 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: <5C1C5BB3-BB6B-4983-BC33-A9A012447B24@thelastpickle.com> References: <232C9CEA-E04C-44DC-A7D6-64EFB20D53F6@thelastpickle.com> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1244.3) Not sure why it's different for the nodes at the end of the ring. But = I'm going to assume Quorum is working as expected and it's an artifact = of the way the ring ownership is calculated. Cheers =20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 10:07 PM, Thibaut Britz wrote: > Hi, >=20 > Thanks for explaining: As I understood each node now only displays > it's local view of the the data it cotains, and not the global view > anymore. >=20 > One more question: > Why do the nodes at the end of the ring only show the % load from 2 > nodes and not from 3? > We are always writing with quorum, so there should also be data on the > adjacent nodes? Or are the quorum writes not working as expected (only > writing to 2 nodes) instead of 3 at the beginning and end of the > cluster? >=20 > Thanks, > Thibaut >=20 >=20 > On Mon, Aug 22, 2011 at 12:01 AM, aaron morton = wrote: >> I'm not sure what the fix is. >>=20 >> When using an order preserving partitioner it's up to you to ensure = the ring is correctly balanced. >>=20 >> Say you have the following setup=85 >>=20 >> node : token >> 1 : a >> 2 : h >> 3 : p >>=20 >> If keys are always 1 character we can say each node own's roughly 33% = of the ring. Because we know there are only 26 possible keys. >>=20 >> With the RP we know how many keys there are, the output of the md5 = calculation is a 128 bit integer. So we can say what fraction of the = total each range is. >>=20 >> If in the example above keys are of any length, how many values exist = between a and h ? >>=20 >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 22/08/2011, at 3:33 AM, Thibaut Britz wrote: >>=20 >>> Hi, >>>=20 >>> I will wait until this is fixed beforeI upgrade, just to be sure. >>>=20 >>> Shall I open a new ticket for this issue? >>>=20 >>> Thanks, >>> Thibaut >>>=20 >>> On Sun, Aug 21, 2011 at 11:57 AM, aaron morton = wrote: >>>> This looks like an artifact of the way ownership is calculated for = the OOP. >>>> See = https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apac= he/cassandra/dht/OrderPreservingPartitioner.java#L177 it >>>> was changed in this ticket >>>> https://issues.apache.org/jira/browse/CASSANDRA-2800 >>>> The change applied in CASSANDRA-2800 was not applied to the >>>> AbstractByteOrderPartitioner. Looks like it should have been. I'll = chase >>>> that up. >>>>=20 >>>> When each node calculates the ownership for the token ranges (for = OOP and >>>> BOP) it's based on the number of keys the node has in that range. = As there >>>> is no way for the OOP to understand the range of values the keys = may take. >>>> If you look at the 192 node it's showing ownership most with 192, = 191 and >>>> 190 - so i'm assuming RF3 and 192 also has data from the ranges = owned by 191 >>>> and 190. >>>> IMHO you can ignore this. >>>> You can use load the the number of keys estimate from cfstats to = get an idea >>>> of whats happening. >>>> Hope that helps. >>>> ----------------- >>>> Aaron Morton >>>> Freelance Cassandra Developer >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> On 19/08/2011, at 9:42 PM, Thibaut Britz wrote: >>>>=20 >>>> Hi, >>>>=20 >>>> we were using apache-cassandra-2011-06-28_08-04-46.jar so far in >>>> production and wanted to upgrade to 0.8.4. >>>>=20 >>>> Our cluster was well balanced and we only saved keys with a lower = case >>>> md5 prefix. (Orderpreserving partitioner). >>>> Each node owned 20% of the tokens, which was also displayed on each >>>> node in nodetool -h localhost ring. >>>>=20 >>>> After upgrading, our well balanced cluster shows completely wrong >>>> percentage on who owns which keys: >>>>=20 >>>> *.*.*.190: >>>> Address DC Rack Status State Load >>>> Owns Token >>>>=20 >>>> ffffffffffffffff >>>> *.*.*.190 datacenter1 rack1 Up Normal 87.95 GB >>>> 34.57% 2a >>>> *.*.*.191 datacenter1 rack1 Up Normal 84.3 GB >>>> 0.02% 55 >>>> *.*.*.192 datacenter1 rack1 Up Normal 79.46 GB >>>> 0.02% 80 >>>> *.*.*.194 datacenter1 rack1 Up Normal 68.16 GB >>>> 0.02% aa >>>> *.*.*.196 datacenter1 rack1 Up Normal 79.9 GB >>>> 65.36% ffffffffffffffff >>>>=20 >>>> *.*.*.191: >>>> Address DC Rack Status State Load >>>> Owns Token >>>>=20 >>>> ffffffffffffffff >>>> *.*.*.190 datacenter1 rack1 Up Normal 87.95 GB >>>> 36.46% 2a >>>> *.*.*.191 datacenter1 rack1 Up Normal 84.3 GB >>>> 26.02% 55 >>>> *.*.*.192 datacenter1 rack1 Up Normal 79.46 GB >>>> 0.02% 80 >>>> *.*.*.194 datacenter1 rack1 Up Normal 68.16 GB >>>> 0.02% aa >>>> *.*.*.196 datacenter1 rack1 Up Normal 79.9 GB >>>> 37.48% ffffffffffffffff >>>>=20 >>>> *.*.*.192: >>>> Address DC Rack Status State Load >>>> Owns Token >>>>=20 >>>> ffffffffffffffff >>>> *.*.*.190 datacenter1 rack1 Up Normal 87.95 GB >>>> 38.16% 2a >>>> *.*.*.191 datacenter1 rack1 Up Normal 84.3 GB >>>> 27.61% 55 >>>> *.*.*.192 datacenter1 rack1 Up Normal 79.46 GB >>>> 34.17% 80 >>>> *.*.*.194 datacenter1 rack1 Up Normal 68.16 GB >>>> 0.02% aa >>>> *.*.*.196 datacenter1 rack1 Up Normal 79.9 GB >>>> 0.02% ffffffffffffffff >>>>=20 >>>> *.*.*.194: >>>> Address DC Rack Status State Load >>>> Owns Token >>>>=20 >>>> ffffffffffffffff >>>> *.*.*.190 datacenter1 rack1 Up Normal 87.95 GB >>>> 0.03% 2a >>>> *.*.*.191 datacenter1 rack1 Up Normal 84.3 GB >>>> 31.43% 55 >>>> *.*.*.192 datacenter1 rack1 Up Normal 79.46 GB >>>> 39.69% 80 >>>> *.*.*.194 datacenter1 rack1 Up Normal 68.16 GB >>>> 28.82% aa >>>> *.*.*.196 datacenter1 rack1 Up Normal 79.9 GB >>>> 0.03% ffffffffffffffff >>>>=20 >>>> *.*.*.196: >>>> Address DC Rack Status State Load >>>> Owns Token >>>>=20 >>>> ffffffffffffffff >>>> *.*.*.190 datacenter1 rack1 Up Normal 87.95 GB >>>> 0.02% 2a >>>> *.*.*.191 datacenter1 rack1 Up Normal 84.3 GB >>>> 0.02% 55 >>>> *.*.*.192 datacenter1 rack1 Up Normal 79.46 GB >>>> 0.02% 80 >>>> *.*.*.194 datacenter1 rack1 Up Normal 68.16 GB >>>> 27.52% aa >>>> *.*.*.196 datacenter1 rack1 Up Normal 79.9 GB >>>> 72.42% ffffffffffffffff >>>>=20 >>>>=20 >>>> Interestingly, each server shows something completely different. >>>>=20 >>>> Removing the locationInfo files didn't help. >>>> -Dcassandra.load_ring_state=3Dfalse didn't help as well. >>>>=20 >>>> Our cassandra.yaml is at http://pastebin.com/pCVCt3RM >>>>=20 >>>> Any idea on what might cause this? Is it save to suspect that >>>> operating under this distribution will cause severe data loss? Or = can >>>> I safely ignore this? >>>>=20 >>>> Thanks, >>>> Thibaut >>>>=20 >>>>=20 >>=20 >>=20