incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: nodetool repair taking forever
Date Tue, 22 May 2012 09:05:18 GMT
> I also dont understand if all these nodes are replicas of each other why is that the first
node has almost double the data.
Have you performed any token moves ? Old data is not deleted unless you run nodetool cleanup.

Another possibility is things like a lot of hints. Admittedly it would have to be a *lot*
of hints.
The third is that compaction has fallen behind. 

> This week its even worse, the nodetool repair has been running for the last 15 hours
just on the first node and when I run nodetool compactionstats I constantly see this -
> 
> pending tasks: 3
First check the logs for errors. 

Repair will first calculate the differences, you can see this as a validation compaction in
nodetool compactionstats.
Then it will stream the data, you can watch that with nodetool netstats. 

Try to work out which part is taking the most time.   15 hours for 50Gb sounds like a long
time (btw do you have compaction on ?)

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/05/2012, at 3:14 AM, Raj N wrote:

> Hi experts,
> 
> I have a 6 node cluster spread across 2 DCs. 
> 
>     DC          Rack        Status State   Load            Owns    Token
>                                                                    113427455640312814857969558651062452225
>     DC1         RAC13       Up     Normal  95.98 GB        33.33%  0
>     DC2         RAC5        Up     Normal  50.79 GB        0.00%   1
>     DC1         RAC18       Up     Normal  50.83 GB        33.33%  56713727820156407428984779325531226112
>     DC2         RAC7        Up     Normal  50.74 GB        0.00%   56713727820156407428984779325531226113
>     DC1         RAC19       Up     Normal  61.72 GB        33.33%  113427455640312814857969558651062452224
>     DC2         RAC9        Up     Normal  50.83 GB        0.00%   113427455640312814857969558651062452225
> 
> They are all replicas of each other. All reads and writes are done at LOCAL_QUORUM. We
are on Cassandra 0.8.4. I see that our weekend nodetool repair runs for more than 12 hours.
Especially on the first one which has 96 GB data. Is this usual? We are using 500 GB SAS drives
with ext4 file system. This gets worse every week. This week its even worse, the nodetool
repair has been running for the last 15 hours just on the first node and when I run nodetool
compactionstats I constantly see this -
> 
> pending tasks: 3
> 
> and nothing else. Looks like its just stuck. There's nothing substantial in the logs
as well. I also dont understand if all these nodes are replicas of each other why is that
the first node has almost double the data. Any help will be really appreciated.
> 
> Thanks
> -Raj


Mime
View raw message