cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13740) Orphan hint file gets created while node is being removed from cluster
Date Mon, 21 Aug 2017 21:02:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135814#comment-16135814
] 

Aleksey Yeschenko commented on CASSANDRA-13740:
-----------------------------------------------

Hey. There are a few [code style|https://wiki.apache.org/cassandra/CodeStyle] issues: we don't
use {{final}} for arguments and local variables, brackets go to new lines always. And the
patch doesn't wait for {{closeWriter}} future to be completed.

And a more interesting issue is that of the delay. {{RING_DELAY}} doesn't have anything to
do with hints. What does is write timeouts, and {{MessagingService}} 's timeout reporter the
callbacks expiring map firing - that's where the race ultimately is.

Also, we aren't fixing the issue of {{nodetool truncatehints}} not being able to clean up
after we excise.

The more I think about it, the more I'm inclined to just correct that last issue and leave
everything else be as is (and also commit your unit tests, thanks for those).

> Orphan hint file gets created while node is being removed from cluster
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-13740
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13740
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Minor
>             Fix For: 3.0.x, 3.11.x
>
>         Attachments: 13740-3.0.15.txt, gossip_hang_test.py
>
>
> I have found this new issue during my test, whenever node is being removed then hint
file for that node gets written and stays inside the hint directory forever. I debugged the
code and found that it is due to the race condition between [HintsWriteExecutor.java::flush
| https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195]
and [HintsWriteExecutor.java::closeWriter | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106]
> . 
>  
> *Time t1* Node is down, as a result Hints are being written by [HintsWriteExecutor.java::flush
| https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195]
> *Time t2* Node is removed from cluster as a result it calls [HintsService.java-exciseStore
| https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327]
which removes hint files for the node being removed
> *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145]
which again calls [HintsWriteExecutor.java::flush | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]
and new orphan file gets created
> I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that helped me reproduce
this new bug. I will submit patch for this new dtest later.
> I also tried following to check how this orphan hint file responds:
> 1. I tried {{nodetool truncatehints <node>}} but it fails as node is no longer
part of the ring
> 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint file because
it is not yet included in the [dispatchDequeue | https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53]
> Reproducible steps:
> Please find dTest python file {{gossip_hang_test.py}} attached which reproduces this
bug.
> Solution:
> This is due to race condition as mentioned above. Since {{HintsWriteExecutor.java}} creates
thread pool with only 1 worker, so solution becomes little simple. Whenever we [HintService.java::excise
| https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303]
a host, just store it in-memory, and check for already evicted host inside [HintsWriteExecutor.java::flush
| https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215].
If already evicted host is found then ignore hints.
> Jaydeep



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message