cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Héctor Izquierdo Seliva <izquie...@strands.com>
Subject Re: Problems recovering a dead node
Date Wed, 04 May 2011 13:02:12 GMT

> 
> El mié, 04-05-2011 a las 21:02 +1200, aaron morton escribió:
> > Certainly sounds a bit sick. 
> > 
> > The first error looks like it happens when the index file points to the wrong place
in the data file for the SSTable. The second one happens when the index file is corrupted.
The should be problems nodetool scrub can fix.
> > 
> > The disk space may be dead space to cassandra compaction or some other streaming
failure. You can check how much it considers to be live (in use) space using nodetool cfstats.
This will also tell you how many sstables are live. Having a lot of dead SSTables is not necessarily
a bad thing. 
> > 
> > What are the pending tasks ? what is nodetool tpstats showing ? And what does nodetool
ring show from one of the other nodes ? 
> > 
> > I'm assuming there are no errors in the logs on the node. What are the most recent
INFO messages?
> > 
> > Hope that helps. 
> > 


In the end I have had to run repair again, as I was getting old data
back. It seems I'm having the same problem again. Here is my cfstats:

http://pastebin.com/B9eD3b4R

I have 796 sstables for a total of 108GB (and counting) on my data
folder. Almost all of them come from streaming. 694 pending operations.

Here is my ring info:


10.20.13.75     Up     Normal  16.99 GB        16.67%
28356863910078205288614550619314017621      
10.20.13.76     Up     Normal  26.76 GB        16.67%
56713727820156410577229101238628035242      
10.20.13.77     Up     Normal  28.23 GB        16.67%
85070591730234615865843651857942052863      
10.20.13.78     Up     Normal  29.19 GB        16.67%
113427455640312821154458202477256070484     
10.20.13.79     Up     Normal  27.71 GB        16.67%
141784319550391026443072753096570088105     
10.20.13.80     Up     Normal  25.36 GB        16.67%
170141183460469231731687303715884105727

And here is the output of tpstats:

Pool Name                    Active   Pending      Completed
ReadStage                         0         0        6943016
RequestResponseStage              0         0       15011243
MutationStage                     0         0         964296
ReadRepairStage                   0         0        4064197
GossipStage                       0         0          59499
AntiEntropyStage                  0         0             77
MigrationStage                    0         0              0
MemtablePostFlusher               0         0             14
StreamStage                       0         0              0
FlushWriter                       0         0             14
FILEUTILS-DELETE-POOL             0         0              2
MiscStage                         0         0             83
FlushSorter                       0         0              0
InternalResponseStage             0         0              0
HintedHandoff                     0         0              6


Mime
View raw message