getRangeToEndpointMap is very useful, thanks, I didn't know about it...

however, I've reconfigured my cluster since (moved some nodes and tokens) so not the problem is gone. I guess I'll use getRangeToEndpointMap next time I see something like this...

On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

Then the next step is to check StorageService.getRangeToEndpointMap via jmx

--

On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory <rantav@gmail.com> wrote:

> I'm using RackAwareStrategy. But it still doesn't make sense I think...

> let's see what did I miss...

> According to http://wiki.apache.org/cassandra/Operations

>

> RackAwareStrategy: replica 2 is placed in the first node along the ring the

> belongs in another data center than the first; the remaining N-2 replicas,

> if any, are placed on the first nodes along the ring in the same rack as the

> first

>

> 192.168.252.124Up 803.33 MB

> 56713727820156410577229101238628035242 |<--|

> 192.168.252.99Up 352.85 MB

> 56713727820156410577229101238628035243 | ^

> 192.168.252.125Up 134.24 MB

> 85070591730234615865843651857942052863 v |

> 192.168.254.57Up 676.41 MB

> 113427455640312821154458202477256070485 | ^

> 192.168.254.58Up 99.74 MB

> 141784319550391026443072753096570088106 v |

> 192.168.254.59Up 99.94 MB

> 170141183460469231731687303715884105727 |-->|

> Alright, so I made a mistake and didn't use the alternate-datacenter

> suggestion on the page so the first node of every DC is overloaded with

> replicas. However, the current situation still doesn't make sense to me.

> .252.124 will be overloaded b/c it has the first token in the 252 dc.

> .254.57 will also be overloaded since it has the first token in the .254 DC.

> But for which node does 252.99 serve as a replicator? It's not the first in

> the DC and it's just one single token more than it's predecessor (which is

> in the same DC).

> On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

>>

>> I'm saying that .99 is getting a copy of all the data for which .124

>> is the primary. (If you are using RackUnawarePartitioner. If you are

>> using RackAware it is some other node.)

>>

>> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <rantav@gmail.com> wrote:

>> > ok, let me try and translate your answer ;)

>> > Are you saying that the data that was left on the node is

>> > non-primary-replicas of rows from the time before the move?

>> > So this implies that when a node moves in the ring, it will affect

>> > distribution of:

>> > - new keys

>> > - old keys primary node

>> > -- but will not affect distribution of old keys non-primary replicas.

>> > If so, still I don't understand something... I would expect even the

>> > non-primary replicas of keys to be moved since if they don't, how would

>> > they

>> > be found? I mean upon reads the serving node should not care about

>> > whether

>> > the row is new or old, it should have a consistent and global mapping of

>> > tokens. So I guess this ruins my theory...

>> > What did you mean then? Is this deletions of non-primary replicated

>> > data?

>> > How does the replication factor affect the load on the moved host then?

>> >

>> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jbellis@gmail.com>

>> > wrote:

>> >>

>> >> well, there you are then.

>> >>

>> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <rantav@gmail.com> wrote:

>> >> > yes, replication factor = 2

>> >> >

>> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jbellis@gmail.com>

>> >> > wrote:

>> >> >>

>> >> >> you have replication factor > 1 ?

>> >> >>

>> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <rantav@gmail.com>

>> >> >> wrote:

>> >> >> > I hope I understand nodetool cleanup correctly - it should clean

>> >> >> > up

>> >> >> > all

>> >> >> > data

>> >> >> > that does not (currently) belong to this node. If so, I think it

>> >> >> > might

>> >> >> > not

>> >> >> > be working correctly.

>> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below

>> >> >> > 192.168.252.99Up 279.35 MB

>> >> >> > 3544607988759775661076818827414252202

>> >> >> > |<--|

>> >> >> > 192.168.252.124Up 167.23 MB

>> >> >> > 56713727820156410577229101238628035242 | ^

>> >> >> > 192.168.252.125Up 82.91 MB

>> >> >> > 85070591730234615865843651857942052863 v |

>> >> >> > 192.168.254.57Up 366.6 MB

>> >> >> > 113427455640312821154458202477256070485 | ^

>> >> >> > 192.168.254.58Up 88.44 MB

>> >> >> > 141784319550391026443072753096570088106 v |

>> >> >> > 192.168.254.59Up 88.45 MB

>> >> >> > 170141183460469231731687303715884105727 |-->|

>> >> >> > I wanted 124 to take all the load from 99. So I issued a move

>> >> >> > command.

>> >> >> > $ nodetool -h cass99 -p 9004 move

>> >> >> > 56713727820156410577229101238628035243

>> >> >> >

>> >> >> > This command tells 99 to take the space b/w

>> >> >> >

>> >> >> >

>> >> >> >

>> >> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]

>> >> >> > which is basically just one item in the token space, almost

>> >> >> > nothing... I

>> >> >> > wanted it to be very slim (just playing around).

>> >> >> > So, next I get this:

>> >> >> > 192.168.252.124Up 803.33 MB

>> >> >> > 56713727820156410577229101238628035242 |<--|

>> >> >> > 192.168.252.99Up 352.85 MB

>> >> >> > 56713727820156410577229101238628035243 | ^

>> >> >> > 192.168.252.125Up 134.24 MB

>> >> >> > 85070591730234615865843651857942052863 v |

>> >> >> > 192.168.254.57Up 676.41 MB

>> >> >> > 113427455640312821154458202477256070485 | ^

>> >> >> > 192.168.254.58Up 99.74 MB

>> >> >> > 141784319550391026443072753096570088106 v |

>> >> >> > 192.168.254.59Up 99.94 MB

>> >> >> > 170141183460469231731687303715884105727 |-->|

>> >> >> > The tokens are correct, but it seems that 99 still has a lot of

>> >> >> > data.

>> >> >> > Why?

>> >> >> > OK, that might be b/c it didn't delete its moved data.

>> >> >> > So next I issued a nodetool cleanup, which should have taken care

>> >> >> > of

>> >> >> > that.

>> >> >> > Only that it didn't, the node 99 still has 352 MB of data. Why?

>> >> >> > So, you know what, I waited for 1h. Still no good, data wasn't

>> >> >> > cleaned

>> >> >> > up.

>> >> >> > I restarted the server. Still, data wasn't cleaned up... I issued

>> >> >> > a

>> >> >> > cleanup

>> >> >> > again... still no good... what's up with this node?

>> >> >> >

>> >> >> >

>> >> >>

>> >> >>

>> >> >>

>> >> >> --

>> >> >> Jonathan Ellis

>> >> >> Project Chair, Apache Cassandra

>> >> >> co-founder of Riptano, the source for professional Cassandra support

>> >> >> http://riptano.com

>> >> >

>> >> >

>> >>

>> >>

>> >>

>> >> --

>> >> Jonathan Ellis

>> >> Project Chair, Apache Cassandra

>> >> co-founder of Riptano, the source for professional Cassandra support

>> >> http://riptano.com

>> >

>> >

>>

>>

>>

>> --

>> Jonathan Ellis

>> Project Chair, Apache Cassandra

>> co-founder of Riptano, the source for professional Cassandra support

>> http://riptano.com

>

>

Jonathan Ellis

Project Chair, Apache Cassandra

co-founder of Riptano, the source for professional Cassandra support

http://riptano.com