I'm using RackAwareStrategy. But it still doesn't make sense I think... let's see what did I miss...

According to http://wiki.apache.org/cassandra/Operations

RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in

**another**data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the**same**rack as the first

192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |<--|

192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^

192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v |

192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485 | ^

192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106 v |

192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727 |-->|

192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^

192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v |

192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485 | ^

192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106 v |

192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727 |-->|

Alright, so I made a mistake and didn't use the alternate-datacenter suggestion on the page so the first node of every DC is overloaded with replicas. However, the current situation still doesn't make sense to me. .252.124 will be overloaded b/c it has the first token in the 252 dc. .254.57 will also be overloaded since it has the first token in the .254 DC. But for which node does 252.99 serve as a replicator? It's not the first in the DC and it's just one single token more than it's predecessor (which is in the same DC).

On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

I'm saying that .99 is getting a copy of all the data for which .124

is the primary. (If you are using RackUnawarePartitioner. If you are

using RackAware it is some other node.)

--

On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <rantav@gmail.com> wrote:

> ok, let me try and translate your answer ;)

> Are you saying that the data that was left on the node is

> non-primary-replicas of rows from the time before the move?

> So this implies that when a node moves in the ring, it will affect

> distribution of:

> - new keys

> - old keys primary node

> -- but will not affect distribution of old keys non-primary replicas.

> If so, still I don't understand something... I would expect even the

> non-primary replicas of keys to be moved since if they don't, how would they

> be found? I mean upon reads the serving node should not care about whether

> the row is new or old, it should have a consistent and global mapping of

> tokens. So I guess this ruins my theory...

> What did you mean then? Is this deletions of non-primary replicated data?

> How does the replication factor affect the load on the moved host then?

>

> On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

>>

>> well, there you are then.

>>

>> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <rantav@gmail.com> wrote:

>> > yes, replication factor = 2

>> >

>> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jbellis@gmail.com>

>> > wrote:

>> >>

>> >> you have replication factor > 1 ?

>> >>

>> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <rantav@gmail.com> wrote:

>> >> > I hope I understand nodetool cleanup correctly - it should clean up

>> >> > all

>> >> > data

>> >> > that does not (currently) belong to this node. If so, I think it

>> >> > might

>> >> > not

>> >> > be working correctly.

>> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below

>> >> > 192.168.252.99Up 279.35 MB

>> >> > 3544607988759775661076818827414252202

>> >> > |<--|

>> >> > 192.168.252.124Up 167.23 MB

>> >> > 56713727820156410577229101238628035242 | ^

>> >> > 192.168.252.125Up 82.91 MB

>> >> > 85070591730234615865843651857942052863 v |

>> >> > 192.168.254.57Up 366.6 MB

>> >> > 113427455640312821154458202477256070485 | ^

>> >> > 192.168.254.58Up 88.44 MB

>> >> > 141784319550391026443072753096570088106 v |

>> >> > 192.168.254.59Up 88.45 MB

>> >> > 170141183460469231731687303715884105727 |-->|

>> >> > I wanted 124 to take all the load from 99. So I issued a move

>> >> > command.

>> >> > $ nodetool -h cass99 -p 9004 move

>> >> > 56713727820156410577229101238628035243

>> >> >

>> >> > This command tells 99 to take the space b/w

>> >> >

>> >> >

>> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]

>> >> > which is basically just one item in the token space, almost

>> >> > nothing... I

>> >> > wanted it to be very slim (just playing around).

>> >> > So, next I get this:

>> >> > 192.168.252.124Up 803.33 MB

>> >> > 56713727820156410577229101238628035242 |<--|

>> >> > 192.168.252.99Up 352.85 MB

>> >> > 56713727820156410577229101238628035243 | ^

>> >> > 192.168.252.125Up 134.24 MB

>> >> > 85070591730234615865843651857942052863 v |

>> >> > 192.168.254.57Up 676.41 MB

>> >> > 113427455640312821154458202477256070485 | ^

>> >> > 192.168.254.58Up 99.74 MB

>> >> > 141784319550391026443072753096570088106 v |

>> >> > 192.168.254.59Up 99.94 MB

>> >> > 170141183460469231731687303715884105727 |-->|

>> >> > The tokens are correct, but it seems that 99 still has a lot of data.

>> >> > Why?

>> >> > OK, that might be b/c it didn't delete its moved data.

>> >> > So next I issued a nodetool cleanup, which should have taken care of

>> >> > that.

>> >> > Only that it didn't, the node 99 still has 352 MB of data. Why?

>> >> > So, you know what, I waited for 1h. Still no good, data wasn't

>> >> > cleaned

>> >> > up.

>> >> > I restarted the server. Still, data wasn't cleaned up... I issued a

>> >> > cleanup

>> >> > again... still no good... what's up with this node?

>> >> >

>> >> >

>> >>

>> >>

>> >>

>> >> --

>> >> Jonathan Ellis

>> >> Project Chair, Apache Cassandra

>> >> co-founder of Riptano, the source for professional Cassandra support

>> >> http://riptano.com

>> >

>> >

>>

>>

>>

>> --

>> Jonathan Ellis

>> Project Chair, Apache Cassandra

>> co-founder of Riptano, the source for professional Cassandra support

>> http://riptano.com

>

>

Jonathan Ellis

Project Chair, Apache Cassandra

co-founder of Riptano, the source for professional Cassandra support

http://riptano.com