cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Arena (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-2798) Repair Fails 0.8
Date Tue, 21 Jun 2011 15:37:47 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052613#comment-13052613
] 

David Arena edited comment on CASSANDRA-2798 at 6/21/11 3:37 PM:
-----------------------------------------------------------------

Ok. so i cant exactly hand out the inserting script but i can give you an indication of the
data format.. our scripts for testing are complex classes, whereby building objects.. However
this is EXACTLY an output of what actually is written to cassandra ( CF test1 ) with a little
obfuscation..

UUID4 key:
f9bb44f2844241df971e0975005c87dc   
DATA FORMAT:
{'i': '[{"pr": ["XYZ", "0.47"], "va": [["XZ", "0.19"]], "pu": "1307998855", "iu": "http://devel.test.com/test2730",
"it": "TESTERtype 0: TESTERobject 2730", "pi": {"XYZ": "0.31", "XZ": "0.47"}, "id": "0!2730"}]',
'cu': 'XYZ', 'cd': '1308668648'}

For the CF test 2, the data format looks like this..
UUID4 key:
f9bb44f2844241df971e0975005c87dc
DATA FORMAT:
('0!2243', {'rt': '1308221914', 'ri': '1308218344', 'pu': '1308218344'}), ('1!2342', {'pu':
'1308080741'}), ('2!1731', {'pu': '1308618693'}), ('3!3772', {'pu': '1308338296'})..

There can be up to 100 fields per key in CF test2..

basically for every insert in CF test1, there is a corresponding insert in CF test2(with roughly
50-100 fields)

Try loading 100,000 of random uuid4 keys into CF test1 with the corresponding keys/fields
in CF test2...

Furthermore, i have retried the test.. precisely again and again.. including a flush before
killing node3.. Still im not able to succeed..

Before...
10.0.1.150      Up     Normal  2.61 GB         33.33%  0                                 
         
10.0.1.152      Up     Normal  2.61 GB         33.33%  56713727820156410577229101238628035242
     
10.0.1.154      Up     Normal  2.61 GB         33.33%  113427455640312821154458202477256070485


After Killing Node and Restart...
10.0.1.150      Up     Normal  2.61 GB         33.33%  0                                 
         
10.0.1.152      Up     Normal  2.61 GB         33.33%  56713727820156410577229101238628035242
     
10.0.1.154      Up     Normal  61.69 KB        33.33%  113427455640312821154458202477256070485

After Running Repair...
10.0.1.150      Up     Normal  4.76 GB         33.33%  0                                 
         
10.0.1.152      Up     Normal  5.41 GB         33.33%  56713727820156410577229101238628035242
     
10.0.1.154      Up     Normal  8.87 GB         33.33%  113427455640312821154458202477256070485

After Running Flush & Compact on ALL nodes...
10.0.1.150      Up     Normal  4.76 GB         33.33%  0                                 
         
10.0.1.152      Up     Normal  5.41 GB         33.33%  56713727820156410577229101238628035242
     
10.0.1.154      Up     Normal  4.86 GB         33.33%  113427455640312821154458202477256070485

This does not occur in 0.7.6.. in fact it works perfectly..

      was (Author: arenstar):
    Ok. so i cant exactly hand out the inserting script but i can give you an indication of
the data format.. our scripts for testing are complex classes, whereby building objects..
However this is EXACTLY an output of what actually is written to cassandra ( CF test1 ) with
a little obfuscation..

UUID4 key:
f9bb44f2844241df971e0975005c87dc   
DATA FORMAT:
{'i': '[{"pr": ["XYZ", "0.47"], "va": [["XZ", "0.19"]], "pu": "1307998855", "iu": "http://devel.test.com/test2730",
"it": "TESTERtype 0: TESTERobject 2730", "pi": {"XYZ": "0.31", "XZ": "0.47"}, "id": "0!2730"}]',
'cu': 'XYZ', 'cd': '1308668648'}

Try loading 100,000 of these with random uuid4 keys...

Furthermore, i have retried the test.. precisely again and again.. including a flush before
killing node3.. Still im left with..
Before...
10.0.1.150      Up     Normal  2.61 GB         33.33%  0                                 
         
10.0.1.152      Up     Normal  2.61 GB         33.33%  56713727820156410577229101238628035242
     
10.0.1.154      Up     Normal  2.61 GB         33.33%  113427455640312821154458202477256070485


After Killing Node and Restart...
10.0.1.150      Up     Normal  2.61 GB         33.33%  0                                 
         
10.0.1.152      Up     Normal  2.61 GB         33.33%  56713727820156410577229101238628035242
     
10.0.1.154      Up     Normal  61.69 KB        33.33%  113427455640312821154458202477256070485

After Running Repair...
10.0.1.150      Up     Normal  4.76 GB         33.33%  0                                 
         
10.0.1.152      Up     Normal  5.41 GB         33.33%  56713727820156410577229101238628035242
     
10.0.1.154      Up     Normal  8.87 GB         33.33%  113427455640312821154458202477256070485

After Running Flush & Compact on ALL nodes...
10.0.1.150      Up     Normal  4.76 GB         33.33%  0                                 
         
10.0.1.152      Up     Normal  5.41 GB         33.33%  56713727820156410577229101238628035242
     
10.0.1.154      Up     Normal  4.86 GB         33.33%  113427455640312821154458202477256070485

This does not occur in 0.7.6.. in fact it works perfectly..
  
> Repair Fails 0.8
> ----------------
>
>                 Key: CASSANDRA-2798
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2798
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: David Arena
>            Assignee: Sylvain Lebresne
>
> I am seeing a fatal problem in the new 0.8
> Im running a 3 node cluster with a replication_factor of 3..
> On Node 3.. If i 
> # kill -9 cassandra-pid
> # rm -rf "All data & logs"
> # start cassandra
> # nodetool -h "node-3-ip" repair
> The whole cluster become duplicated..
> * e.g Before 
> node 1 -> 2.65GB
> node 2 -> 2.65GB
> node 3 -> 2.65GB
> * e.g After
> node 1 -> 5.3GB
> node 2 -> 5.3GB
> node 3 -> 7.95GB
> -> nodetool repair, never ends (96 hours +), however there is no streams running,
nor any cpu or disk activity..
> -> Manually killing the repair and restarting does not help.. Restarting the server/cassandra
does not help..
> -> nodetool flush,compact,cleanup all complete, but do not help...
> This is not occuring in 0.7.6.. I have come to the conclusion this is a Major 0.8 issue
> Running: CentOS 5.6, JDK 1.6.0_26

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message