Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Tue, 21 Jun 2011 15:37:47 +0000 (UTC)
From: "David Arena (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: 
 <124361737.24656.1308670667601.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <1171525150.20688.1308577787429.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Issue Comment Edited] (CASSANDRA-2798) Repair Fails 0.8
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052613#comment-13052613 ] 

David Arena edited comment on CASSANDRA-2798 at 6/21/11 3:37 PM:
-----------------------------------------------------------------

Ok. so i cant exactly hand out the inserting script but i can give you an indication of the data format.. our scripts for testing are complex classes, whereby building objects.. However this is EXACTLY an output of what actually is written to cassandra ( CF test1 ) with a little obfuscation..

UUID4 key:
f9bb44f2844241df971e0975005c87dc   
DATA FORMAT:
{'i': '[{"pr": ["XYZ", "0.47"], "va": [["XZ", "0.19"]], "pu": "1307998855", "iu": "http://devel.test.com/test2730", "it": "TESTERtype 0: TESTERobject 2730", "pi": {"XYZ": "0.31", "XZ": "0.47"}, "id": "0!2730"}]', 'cu': 'XYZ', 'cd': '1308668648'}

For the CF test 2, the data format looks like this..
UUID4 key:
f9bb44f2844241df971e0975005c87dc
DATA FORMAT:
('0!2243', {'rt': '1308221914', 'ri': '1308218344', 'pu': '1308218344'}), ('1!2342', {'pu': '1308080741'}), ('2!1731', {'pu': '1308618693'}), ('3!3772', {'pu': '1308338296'})..

There can be up to 100 fields per key in CF test2..

basically for every insert in CF test1, there is a corresponding insert in CF test2(with roughly 50-100 fields)

Try loading 100,000 of random uuid4 keys into CF test1 with the corresponding keys/fields in CF test2...

Furthermore, i have retried the test.. precisely again and again.. including a flush before killing node3.. Still im not able to succeed..

Before...
10.0.1.150      Up     Normal  2.61 GB         33.33%  0                                           
10.0.1.152      Up     Normal  2.61 GB         33.33%  56713727820156410577229101238628035242      
10.0.1.154      Up     Normal  2.61 GB         33.33%  113427455640312821154458202477256070485 

After Killing Node and Restart...
10.0.1.150      Up     Normal  2.61 GB         33.33%  0                                           
10.0.1.152      Up     Normal  2.61 GB         33.33%  56713727820156410577229101238628035242      
10.0.1.154      Up     Normal  61.69 KB        33.33%  113427455640312821154458202477256070485

After Running Repair...
10.0.1.150      Up     Normal  4.76 GB         33.33%  0                                           
10.0.1.152      Up     Normal  5.41 GB         33.33%  56713727820156410577229101238628035242      
10.0.1.154      Up     Normal  8.87 GB         33.33%  113427455640312821154458202477256070485

After Running Flush & Compact on ALL nodes...
10.0.1.150      Up     Normal  4.76 GB         33.33%  0                                           
10.0.1.152      Up     Normal  5.41 GB         33.33%  56713727820156410577229101238628035242      
10.0.1.154      Up     Normal  4.86 GB         33.33%  113427455640312821154458202477256070485

This does not occur in 0.7.6.. in fact it works perfectly..

      was (Author: arenstar):
    Ok. so i cant exactly hand out the inserting script but i can give you an indication of the data format.. our scripts for testing are complex classes, whereby building objects.. However this is EXACTLY an output of what actually is written to cassandra ( CF test1 ) with a little obfuscation..

UUID4 key:
f9bb44f2844241df971e0975005c87dc   
DATA FORMAT:
{'i': '[{"pr": ["XYZ", "0.47"], "va": [["XZ", "0.19"]], "pu": "1307998855", "iu": "http://devel.test.com/test2730", "it": "TESTERtype 0: TESTERobject 2730", "pi": {"XYZ": "0.31", "XZ": "0.47"}, "id": "0!2730"}]', 'cu': 'XYZ', 'cd': '1308668648'}

Try loading 100,000 of these with random uuid4 keys...

Furthermore, i have retried the test.. precisely again and again.. including a flush before killing node3.. Still im left with..
Before...
10.0.1.150      Up     Normal  2.61 GB         33.33%  0                                           
10.0.1.152      Up     Normal  2.61 GB         33.33%  56713727820156410577229101238628035242      
10.0.1.154      Up     Normal  2.61 GB         33.33%  113427455640312821154458202477256070485 

After Killing Node and Restart...
10.0.1.150      Up     Normal  2.61 GB         33.33%  0                                           
10.0.1.152      Up     Normal  2.61 GB         33.33%  56713727820156410577229101238628035242      
10.0.1.154      Up     Normal  61.69 KB        33.33%  113427455640312821154458202477256070485

After Running Repair...
10.0.1.150      Up     Normal  4.76 GB         33.33%  0                                           
10.0.1.152      Up     Normal  5.41 GB         33.33%  56713727820156410577229101238628035242      
10.0.1.154      Up     Normal  8.87 GB         33.33%  113427455640312821154458202477256070485

After Running Flush & Compact on ALL nodes...
10.0.1.150      Up     Normal  4.76 GB         33.33%  0                                           
10.0.1.152      Up     Normal  5.41 GB         33.33%  56713727820156410577229101238628035242      
10.0.1.154      Up     Normal  4.86 GB         33.33%  113427455640312821154458202477256070485

This does not occur in 0.7.6.. in fact it works perfectly..
  
> Repair Fails 0.8
> ----------------
>
>                 Key: CASSANDRA-2798
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2798
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: David Arena
>            Assignee: Sylvain Lebresne
>
> I am seeing a fatal problem in the new 0.8
> Im running a 3 node cluster with a replication_factor of 3..
> On Node 3.. If i 
> # kill -9 cassandra-pid
> # rm -rf "All data & logs"
> # start cassandra
> # nodetool -h "node-3-ip" repair
> The whole cluster become duplicated..
> * e.g Before 
> node 1 -> 2.65GB
> node 2 -> 2.65GB
> node 3 -> 2.65GB
> * e.g After
> node 1 -> 5.3GB
> node 2 -> 5.3GB
> node 3 -> 7.95GB
> -> nodetool repair, never ends (96 hours +), however there is no streams running, nor any cpu or disk activity..
> -> Manually killing the repair and restarting does not help.. Restarting the server/cassandra does not help..
> -> nodetool flush,compact,cleanup all complete, but do not help...
> This is not occuring in 0.7.6.. I have come to the conclusion this is a Major 0.8 issue
> Running: CentOS 5.6, JDK 1.6.0_26

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira