incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: nodetool cleanup isn't cleaning up?
Date Thu, 03 Jun 2010 06:15:40 GMT
Then the next step is to check StorageService.getRangeToEndpointMap via jmx

On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory <rantav@gmail.com> wrote:
> I'm using RackAwareStrategy. But it still doesn't make sense I think...
> let's see what did I miss...
> According to http://wiki.apache.org/cassandra/Operations
>
> RackAwareStrategy: replica 2 is placed in the first node along the ring the
> belongs in another data center than the first; the remaining N-2 replicas,
> if any, are placed on the first nodes along the ring in the same rack as the
> first
>
> 192.168.252.124Up        803.33 MB
> 56713727820156410577229101238628035242     |<--|
> 192.168.252.99Up         352.85 MB
> 56713727820156410577229101238628035243     |   ^
> 192.168.252.125Up        134.24 MB
> 85070591730234615865843651857942052863     v   |
> 192.168.254.57Up         676.41 MB
>  113427455640312821154458202477256070485    |   ^
> 192.168.254.58Up          99.74 MB
>  141784319550391026443072753096570088106    v   |
> 192.168.254.59Up          99.94 MB
>  170141183460469231731687303715884105727    |-->|
> Alright, so I made a mistake and didn't use the alternate-datacenter
> suggestion on the page so the first node of every DC is overloaded with
> replicas. However,  the current situation still doesn't make sense to me.
> .252.124 will be overloaded b/c it has the first token in the 252 dc.
> .254.57 will also be overloaded since it has the first token in the .254 DC.
> But for which node does 252.99 serve as a replicator? It's not the first in
> the DC and it's just one single token more than it's predecessor (which is
> in the same DC).
> On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> I'm saying that .99 is getting a copy of all the data for which .124
>> is the primary.  (If you are using RackUnawarePartitioner.  If you are
>> using RackAware it is some other node.)
>>
>> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <rantav@gmail.com> wrote:
>> > ok, let me try and translate your answer ;)
>> > Are you saying that the data that was left on the node is
>> > non-primary-replicas of rows from the time before the move?
>> > So this implies that when a node moves in the ring, it will affect
>> > distribution of:
>> > - new keys
>> > - old keys primary node
>> > -- but will not affect distribution of old keys non-primary replicas.
>> > If so, still I don't understand something... I would expect even the
>> > non-primary replicas of keys to be moved since if they don't, how would
>> > they
>> > be found? I mean upon reads the serving node should not care about
>> > whether
>> > the row is new or old, it should have a consistent and global mapping of
>> > tokens. So I guess this ruins my theory...
>> > What did you mean then? Is this deletions of non-primary replicated
>> > data?
>> > How does the replication factor affect the load on the moved host then?
>> >
>> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jbellis@gmail.com>
>> > wrote:
>> >>
>> >> well, there you are then.
>> >>
>> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <rantav@gmail.com> wrote:
>> >> > yes, replication factor = 2
>> >> >
>> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jbellis@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> you have replication factor > 1 ?
>> >> >>
>> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <rantav@gmail.com>
>> >> >> wrote:
>> >> >> > I hope I understand nodetool cleanup correctly - it should
clean
>> >> >> > up
>> >> >> > all
>> >> >> > data
>> >> >> > that does not (currently) belong to this node. If so, I think
it
>> >> >> > might
>> >> >> > not
>> >> >> > be working correctly.
>> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
>> >> >> > 192.168.252.99Up         279.35 MB
>> >> >> > 3544607988759775661076818827414252202
>> >> >> >      |<--|
>> >> >> > 192.168.252.124Up         167.23 MB
>> >> >> > 56713727820156410577229101238628035242     |   ^
>> >> >> > 192.168.252.125Up         82.91 MB
>> >> >> >  85070591730234615865843651857942052863     v   |
>> >> >> > 192.168.254.57Up         366.6 MB
>> >> >> >  113427455640312821154458202477256070485    |   ^
>> >> >> > 192.168.254.58Up         88.44 MB
>> >> >> >  141784319550391026443072753096570088106    v   |
>> >> >> > 192.168.254.59Up         88.45 MB
>> >> >> >  170141183460469231731687303715884105727    |-->|
>> >> >> > I wanted 124 to take all the load from 99. So I issued a move
>> >> >> > command.
>> >> >> > $ nodetool -h cass99 -p 9004 move
>> >> >> > 56713727820156410577229101238628035243
>> >> >> >
>> >> >> > This command tells 99 to take the space b/w
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
>> >> >> > which is basically just one item in the token space, almost
>> >> >> > nothing... I
>> >> >> > wanted it to be very slim (just playing around).
>> >> >> > So, next I get this:
>> >> >> > 192.168.252.124Up         803.33 MB
>> >> >> > 56713727820156410577229101238628035242     |<--|
>> >> >> > 192.168.252.99Up         352.85 MB
>> >> >> > 56713727820156410577229101238628035243     |   ^
>> >> >> > 192.168.252.125Up         134.24 MB
>> >> >> > 85070591730234615865843651857942052863     v   |
>> >> >> > 192.168.254.57Up         676.41 MB
>> >> >> > 113427455640312821154458202477256070485    |   ^
>> >> >> > 192.168.254.58Up         99.74 MB
>> >> >> >  141784319550391026443072753096570088106    v   |
>> >> >> > 192.168.254.59Up         99.94 MB
>> >> >> >  170141183460469231731687303715884105727    |-->|
>> >> >> > The tokens are correct, but it seems that 99 still has a lot
of
>> >> >> > data.
>> >> >> > Why?
>> >> >> > OK, that might be b/c it didn't delete its moved data.
>> >> >> > So next I issued a nodetool cleanup, which should have taken
care
>> >> >> > of
>> >> >> > that.
>> >> >> > Only that it didn't, the node 99 still has 352 MB of data.
Why?
>> >> >> > So, you know what, I waited for 1h. Still no good, data wasn't
>> >> >> > cleaned
>> >> >> > up.
>> >> >> > I restarted the server. Still, data wasn't cleaned up... I
issued
>> >> >> > a
>> >> >> > cleanup
>> >> >> > again... still no good... what's up with this node?
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Jonathan Ellis
>> >> >> Project Chair, Apache Cassandra
>> >> >> co-founder of Riptano, the source for professional Cassandra support
>> >> >> http://riptano.com
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of Riptano, the source for professional Cassandra support
>> >> http://riptano.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message