incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Decommissioned nodes not leaving and Hinted Handoff flood
Date Wed, 10 Jul 2013 23:42:31 GMT
Thanks for sharing, here is some more information…

> 1 - At first, one of my node came down 5 min and when it came back it get flooded by
Hinted Handoff so hard that it could not handle the real time queries properly. I haven't
find a way to prioritize app queries rather than Hinted Handoff.
You can disable hint delivery with nodetool pausehandoff or reduce the hint throughput https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L50
 
> 2 - Nodes keep hints for a node that has been removed.
The hints are stored with a TTL that is the gc_grace_seconds for the CF a the time the hint
is written, so they will eventually be purged by compaction. 

You can also delete the hints using the Hinted Handoff bean https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/db/HintedHandOffManagerMBean.java#L30

> 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be decommissioned,
they stuck after streaming their data.
The hint KS is defined using the LocalStrategy and so it not replicated. They should not be
involved in streaming. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/07/2013, at 12:47 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> Hi,
> 
> C*1.2.2.
> 
> I have removed 4 nodes with "nodetool decommission". 2 of them have left with no issue,
while the 2 others nodes remained "leaving" even after streaming their data.
> 
> The only specific thing of these 2 nodes is that they had a lot of hints pending. Hints
from a node that couldn't come back and that I removed earlier (because of the heavy load
induced by Hinted Handoff while coming back, which induced a lot of latencies in our app.
This node didn't manage to come back after 10 minutes, I removed it).
> 
> So there I faced 3 bugs (or problems) :
> 
> 1 - At first, one of my node came down 5 min and when it came back it get flooded by
Hinted Handoff so hard that it could not handle the real time queries properly. I haven't
find a way to prioritize app queries rather than Hinted Handoff.
> 2 - Nodes keep hints for a node that has been removed.
> 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be decommissioned,
they stuck after streaming their data.
> 
> 
> As solutions for this 3 issues I did the following:
> 
> Solution to 1 - I removed this down node (nodetool removenode)
> Solution to 2 - Stop the node remove system hints
> Solution to 3 - Stop the node and removenode instead of decommission
> 
> Now I have no more issue, yet I felt I had to report this. Maybe my experience can help
users to get out of tricky situations and commiters to detect some issues,  specially about
hinted handoff.
> 
> Alain
> 
> 


Mime
View raw message