cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: balancing load
Date Mon, 17 Jan 2011 14:54:05 GMT
On Mon, Jan 17, 2011 at 2:44 AM, aaron morton <aaron@thelastpickle.com> wrote:
> The nodes will not automatically delete stale data, to do that you need to run nodetool
cleanup.
>
> See step 3 in the Range Changes > Bootstrap http://wiki.apache.org/cassandra/Operations#Range_changes
>
> If you are feeling paranoid before hand, you could run nodetool repair on each node in
turn to make sure they have the correct data. http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data
>
> You may also have some tombstones in there, they will not be deleted until after GCGraceSeconds
> http://wiki.apache.org/cassandra/DistributedDeletes
>
> Hope that helps.
> Aaron
>
> On 17 Jan 2011, at 20:34, Karl Hiramoto wrote:
>
>> Thanks for the help.  I used "nodetool move", so now each node owns 20%
>> of the space, but it seems that the data load is still mostly on 2 nodes.
>>
>>
>> nodetool  --host slave4 ring
>> Address         Status State   Load            Owns
>> Token
>>
>>      136112946768375385385349842972707284580
>> 10.1.4.10     Up     Normal  335.9 MB        20.00%
>> 0
>> 10.1.4.12     Up     Normal  54.42 KB        20.00%
>> 34028236692093846346337460743176821145
>> 10.1.4.13     Up     Normal  59.32 KB        20.00%
>> 68056473384187692692674921486353642290
>> 10.1.4.14     Up     Normal  6.33 GB         20.00%
>> 102084710076281539039012382229530463435
>> 10.1.4.15     Up     Normal  6.36 GB         20.00%
>> 136112946768375385385349842972707284580
>>
>>
>>
>>
>> --
>> Karl
>
>

Just to head the next possible problem. If you run 'nodetool cleanup'
on each node and some of your nodes still have more data then others,
then it probably means your are writing the majority of data to a few
keys. ( you probably do not want to do that )

If that happens, you can use nodetool cfstats on each node and ensure
that the 'max row compacted size' is roughly the same on all nodes. If
you have one or two really big rows that could explain your imbalance.

Mime
View raw message