Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EDB84030 for ; Tue, 21 Jun 2011 15:38:09 +0000 (UTC) Received: (qmail 84645 invoked by uid 500); 21 Jun 2011 15:38:09 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 84614 invoked by uid 500); 21 Jun 2011 15:38:09 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 84606 invoked by uid 99); 21 Jun 2011 15:38:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 15:38:08 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2011 15:38:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 9397E426EA1 for ; Tue, 21 Jun 2011 15:37:47 +0000 (UTC) Date: Tue, 21 Jun 2011 15:37:47 +0000 (UTC) From: "David Arena (JIRA)" To: commits@cassandra.apache.org Message-ID: <124361737.24656.1308670667601.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1171525150.20688.1308577787429.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Issue Comment Edited] (CASSANDRA-2798) Repair Fails 0.8 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052613#comment-13052613 ] David Arena edited comment on CASSANDRA-2798 at 6/21/11 3:37 PM: ----------------------------------------------------------------- Ok. so i cant exactly hand out the inserting script but i can give you an indication of the data format.. our scripts for testing are complex classes, whereby building objects.. However this is EXACTLY an output of what actually is written to cassandra ( CF test1 ) with a little obfuscation.. UUID4 key: f9bb44f2844241df971e0975005c87dc DATA FORMAT: {'i': '[{"pr": ["XYZ", "0.47"], "va": [["XZ", "0.19"]], "pu": "1307998855", "iu": "http://devel.test.com/test2730", "it": "TESTERtype 0: TESTERobject 2730", "pi": {"XYZ": "0.31", "XZ": "0.47"}, "id": "0!2730"}]', 'cu': 'XYZ', 'cd': '1308668648'} For the CF test 2, the data format looks like this.. UUID4 key: f9bb44f2844241df971e0975005c87dc DATA FORMAT: ('0!2243', {'rt': '1308221914', 'ri': '1308218344', 'pu': '1308218344'}), ('1!2342', {'pu': '1308080741'}), ('2!1731', {'pu': '1308618693'}), ('3!3772', {'pu': '1308338296'}).. There can be up to 100 fields per key in CF test2.. basically for every insert in CF test1, there is a corresponding insert in CF test2(with roughly 50-100 fields) Try loading 100,000 of random uuid4 keys into CF test1 with the corresponding keys/fields in CF test2... Furthermore, i have retried the test.. precisely again and again.. including a flush before killing node3.. Still im not able to succeed.. Before... 10.0.1.150 Up Normal 2.61 GB 33.33% 0 10.0.1.152 Up Normal 2.61 GB 33.33% 56713727820156410577229101238628035242 10.0.1.154 Up Normal 2.61 GB 33.33% 113427455640312821154458202477256070485 After Killing Node and Restart... 10.0.1.150 Up Normal 2.61 GB 33.33% 0 10.0.1.152 Up Normal 2.61 GB 33.33% 56713727820156410577229101238628035242 10.0.1.154 Up Normal 61.69 KB 33.33% 113427455640312821154458202477256070485 After Running Repair... 10.0.1.150 Up Normal 4.76 GB 33.33% 0 10.0.1.152 Up Normal 5.41 GB 33.33% 56713727820156410577229101238628035242 10.0.1.154 Up Normal 8.87 GB 33.33% 113427455640312821154458202477256070485 After Running Flush & Compact on ALL nodes... 10.0.1.150 Up Normal 4.76 GB 33.33% 0 10.0.1.152 Up Normal 5.41 GB 33.33% 56713727820156410577229101238628035242 10.0.1.154 Up Normal 4.86 GB 33.33% 113427455640312821154458202477256070485 This does not occur in 0.7.6.. in fact it works perfectly.. was (Author: arenstar): Ok. so i cant exactly hand out the inserting script but i can give you an indication of the data format.. our scripts for testing are complex classes, whereby building objects.. However this is EXACTLY an output of what actually is written to cassandra ( CF test1 ) with a little obfuscation.. UUID4 key: f9bb44f2844241df971e0975005c87dc DATA FORMAT: {'i': '[{"pr": ["XYZ", "0.47"], "va": [["XZ", "0.19"]], "pu": "1307998855", "iu": "http://devel.test.com/test2730", "it": "TESTERtype 0: TESTERobject 2730", "pi": {"XYZ": "0.31", "XZ": "0.47"}, "id": "0!2730"}]', 'cu': 'XYZ', 'cd': '1308668648'} Try loading 100,000 of these with random uuid4 keys... Furthermore, i have retried the test.. precisely again and again.. including a flush before killing node3.. Still im left with.. Before... 10.0.1.150 Up Normal 2.61 GB 33.33% 0 10.0.1.152 Up Normal 2.61 GB 33.33% 56713727820156410577229101238628035242 10.0.1.154 Up Normal 2.61 GB 33.33% 113427455640312821154458202477256070485 After Killing Node and Restart... 10.0.1.150 Up Normal 2.61 GB 33.33% 0 10.0.1.152 Up Normal 2.61 GB 33.33% 56713727820156410577229101238628035242 10.0.1.154 Up Normal 61.69 KB 33.33% 113427455640312821154458202477256070485 After Running Repair... 10.0.1.150 Up Normal 4.76 GB 33.33% 0 10.0.1.152 Up Normal 5.41 GB 33.33% 56713727820156410577229101238628035242 10.0.1.154 Up Normal 8.87 GB 33.33% 113427455640312821154458202477256070485 After Running Flush & Compact on ALL nodes... 10.0.1.150 Up Normal 4.76 GB 33.33% 0 10.0.1.152 Up Normal 5.41 GB 33.33% 56713727820156410577229101238628035242 10.0.1.154 Up Normal 4.86 GB 33.33% 113427455640312821154458202477256070485 This does not occur in 0.7.6.. in fact it works perfectly.. > Repair Fails 0.8 > ---------------- > > Key: CASSANDRA-2798 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2798 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.8.0 > Reporter: David Arena > Assignee: Sylvain Lebresne > > I am seeing a fatal problem in the new 0.8 > Im running a 3 node cluster with a replication_factor of 3.. > On Node 3.. If i > # kill -9 cassandra-pid > # rm -rf "All data & logs" > # start cassandra > # nodetool -h "node-3-ip" repair > The whole cluster become duplicated.. > * e.g Before > node 1 -> 2.65GB > node 2 -> 2.65GB > node 3 -> 2.65GB > * e.g After > node 1 -> 5.3GB > node 2 -> 5.3GB > node 3 -> 7.95GB > -> nodetool repair, never ends (96 hours +), however there is no streams running, nor any cpu or disk activity.. > -> Manually killing the repair and restarting does not help.. Restarting the server/cassandra does not help.. > -> nodetool flush,compact,cleanup all complete, but do not help... > This is not occuring in 0.7.6.. I have come to the conclusion this is a Major 0.8 issue > Running: CentOS 5.6, JDK 1.6.0_26 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira