cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1674) Repair using abnormally large amounts of disk space
Date Wed, 17 Nov 2010 20:08:24 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933122#action_12933122
] 

Jonathan Ellis commented on CASSANDRA-1674:
-------------------------------------------

committed, but I think there is a bug.  With a similar setup to the above (200K keys instead
of 1M), the pre-RF change setup is

{code}
Address         Status State   Load            Token                                     
 
                                       106239986353888428655683112465158427815    
127.0.0.2       Up     Normal  37.97 MB        21212647344528771789748883276744400257    
 
127.0.0.3       Up     Normal  18.98 MB        63523312719601176253752035031089272162    
 
127.0.0.1       Up     Normal  19.05 MB        106239986353888428655683112465158427815   
 
{code}

post-repair is
{code}
Address         Status State   Load            Token                                     
 
                                       106239986353888428655683112465158427815    
127.0.0.2       Up     Normal  57.01 MB        21212647344528771789748883276744400257    
 
127.0.0.3       Up     Normal  56.94 MB        63523312719601176253752035031089272162    
 
127.0.0.1       Up     Normal  19.07 MB        106239986353888428655683112465158427815   
 
{code}

So eyeballing it looks reasonable.  But when I kill node 2 and run

{code}$ python contrib/py_stress/stress.py -n 200000 -o read{code}

I get a ton of key-not-found exceptions, indicating that not all the data on 2 got replicated
to 3.

> Repair using abnormally large amounts of disk space
> ---------------------------------------------------
>
>                 Key: CASSANDRA-1674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1674
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stu Hood
>             Fix For: 0.6.9, 0.7.0
>
>         Attachments: 0001-Only-repair-the-intersecting-portion-of-a-differing-ra.txt,
for-0.6-0001-Only-repair-the-intersecting-portion-of-a-differing-ra.txt
>
>
> I'm watching a repair on a 7 node cluster.  Repair was sent to one node; the node had
18G of data.  No other node has more than 28G.  The node where the repair initiated is now
up to 261G with 53/60 AES tasks outstanding.
> I have seen repair take more space than expected on 0.6 but nothing this extreme.
> Other nodes in the cluster are occasionally logging
> WARN [ScheduledTasks:1] 2010-10-28 08:31:14,305 MessagingService.java (line 515) Dropped
7 messages in the last 1000ms
> The cluster is quiesced except for the repair.  Not sure if the dropped messages are
contributing the the disk space (b/c of retries?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message