cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Overfull node
Date Mon, 17 May 2010 17:29:49 GMT
I had this happen when I changed the seed node in a running cluster, and
then started and stopped various nodes.  I "fixed" it by restarting
the seed node(s) (and waiting for it to be fully up), then restarting
all the other nodes.

-Anthony

On Fri, May 14, 2010 at 05:11:40PM -0700, David Koblas wrote:
> I've somehow now ended up in a very strange place...
> 
> If I ask '150' or '155' about the ring they report each other, but if I 
> ask the rest of the ring they have '155' but not '150' as members.  All 
> of the storage-conf files are basically clones of each other with the 
> same ring masters.
> 
> $ nodetool -h 10.1.0.155 ring
> Address       Status     Load          
> Range                                      Ring
>                                        
> 99811300882272441299595351868344045866
> 10.3.0.150    Up         3.08 TB       
> 6436333895300580402214871779965756352      |<--|
> 10.1.0.155    Up         3.08 TB       
> 99811300882272441299595351868344045866     |-->|
> 
> $ nodetool -h 10.2.0.174 ring
> Address       Status     Load          
> Range                                      Ring
>                                        
> 144951579690133260853298391132870993575
> 10.2.0.115    Up         2.7 TB        
> 55758122058160717108501182340054262660     |<--|
> 10.1.0.155    Up         3.08 TB       
> 99811300882272441299595351868344045866     |   ^
> 10.2.0.174    Up         1.32 TB       
> 118283207506463595491596277948095451613    v   |
> 10.3.0.151    Up         414.51 GB     
> 127520031787005730998588483181387651399    |   ^
> 10.3.0.152    Up         143.03 GB     
> 132137258578258111824507171284723589567    v   |
> 10.3.0.153    Up         245.51 GB     
> 134446064220575108370358944111505967571    |   ^
> 10.2.0.175    Up         3.16 TB       
> 136754979922617117666448707835107404441    v   |
> 10.2.0.114    Up         1.41 TB       
> 144951579690133260853298391132870993575    |-->|
> 
> $ nodetool -h 10.3.0.150 streams
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> 
> $ nodetool -h 10.1.0.155 streams
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> 
> $ nodetool -h 10.2.0.115 streams
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> 
> 
> 
> On 5/11/10 6:30 AM, Jonathan Ellis wrote:
> >s/keyspace/token/ and you've got it.
> >
> >On Mon, May 10, 2010 at 10:34 AM, David Koblas<koblas@extra.com>  wrote:
> >   
> >>Sounds great, will give it a go.  However, just to make sure I understand
> >>getting the keyspace correct.
> >>
> >>Lets say I've got:
> >>    A -- Node before overfull node in keyspace order
> >>    O -- Overfull node
> >>    B -- Node after O in keyspace order
> >>    N -- New empty node
> >>
> >>I'm going to assume that I should make the following assignment:
> >>    keyspace(N) = keyspace(A) + ( keyspace(O) - keyspace(A) ) / 2
> >>
> >>Or did I miss something else about keyspace ranges?
> >>Thanks
> >>
> >>
> >>On 5/7/10 1:25 PM, Jonathan Ellis wrote:
> >>     
> >>>If you're using RackUnawareStrategy (the default replication strategy)
> >>>then you can "bootstrap" manually fairly easily -- copy all the data
> >>>(not system) sstables from an overfull machine to a new machine,
> >>>assign the new one a token that gives it about half of the old node's
> >>>range, then start it with autobootstrap OFF.  Then run cleanup on both
> >>>new and old nodes to remove the part of the data that belongs to the
> >>>other.
> >>>
> >>>The downside vs real bootstrap is you can't do this safely while
> >>>writes are coming in to the original node.  You can reduce your
> >>>read-only period by doing an intial scp, then doing a flush + rsync
> >>>when you're ready to take it read only.
> >>>
> >>>(https://issues.apache.org/jira/browse/CASSANDRA-579 will make this
> >>>problem obsolete for 0.7 but that doesn't help you on 0.6, of course.)
> >>>
> >>>On Fri, May 7, 2010 at 2:08 PM, David Koblas<koblas@extra.com>    wrote:
> >>>
> >>>       
> >>>>I've got two (out of five) nodes on my cassandra ring that somehow got
> >>>>too
> >>>>full (e.g. over 60% disk space utilization).  I've now gotten a few new
> >>>>machines added to the ring, but evertime one of the overfull nodes
> >>>>attempts
> >>>>to stream its data it runs out of diskspace...  I've tried half a dozen
> >>>>different bad ideas of how to get things moving along a bit smoother,

> >>>>but
> >>>>am
> >>>>at a total loss at this point.
> >>>>
> >>>>Is there any good tricks to get cassandra to not need 2x the disk space
> >>>>to
> >>>>stream out, or is something else potentially going on that's causing
me
> >>>>problems?
> >>>>
> >>>>Thanks,
> >>>>
> >>>>
> >>>>         
> >>>
> >>>
> >>>       
> >>     
> >
> >
> >   

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>

Mime
View raw message