cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <tmarthinus...@gmail.com>
Subject Re: Hinted handoff bug?
Date Thu, 01 Dec 2011 23:59:03 GMT
Sorry for not checking source to see if things have changed but i just remembered an issue
I have forgotten to make jira for.

In old days, nodes would periodically try to deliver queues.

However, this was at some stage changed so it only deliver if a node is being marked up.

However, you can definitely have a scenario where  A fails to deliver to B so it send the
hint to C instead.

However, B is not really down, it just could not accept that packet at that time and C always
(correctly in this case) thinks B is up and it never tries to deliver the hints to B.

Will this change fix this, or do we need to get back the thread that periodically tried to
deliver hints regardless of node status changes?

Regards,
Terje

On 1 Dec 2011, at 19:10, Sylvain Lebresne <sylvain@datastax.com> wrote:

> You're right, good catch.
> Do you mind opening a ticket on jira
> (https://issues.apache.org/jira/browse/CASSANDRA)?
> 
> --
> Sylvain
> 
> On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck
> <fredrik.l.stigback@sitevision.se> wrote:
>> Hi,
>> We,re running cassandra 1.0.3.
>> I've done some testing with 2 nodes (node A, node B), replication factor 2.
>> I take node A down, writing some data to node B and then take node A up.
>> Sometimes hints aren't delivered when node A comes up.
>> 
>> I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and
>> sometimes node B ends up in a strange state in method
>> org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress
>> to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries
>> already has node A in it's Set and therefore no hints will ever be delivered
>> to node A.
>> The only reason for this that I can see is that in
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress
>> endpoint) the hintStore.isEmpty() check returns true and the endpoint (node
>> A)  isn't removed from
>> org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints
>> will ever be delivered again until node B is restarted.
>> During what conditions will hintStore.isEmpty() return true?
>> Shouldn't the hintStore.isEmpty() check be inside the try {} finally{}
>> clause, removing the endpoint from queuedDeliveries in the finally block?
>> 
>> public void deliverHints(final InetAddress to)
>> {
>>         logger_.debug("deliverHints to {}", to);
>>         if (!queuedDeliveries.add(to))
>>             return;
>>         .......
>> }
>> 
>> private void deliverHintsToEndpoint(InetAddress endpoint) throws
>> IOException, DigestMismatchException, InvalidRequestException,
>> TimeoutException,
>> {
>>         ColumnFamilyStore hintStore =
>> Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF);
>>         if (hintStore.isEmpty())
>>             return; // nothing to do, don't confuse users by logging a no-op
>> handoff
>>     try
>>     {
>>         ......
>>     }
>>     finally
>>     {
>>             queuedDeliveries.remove(endpoint);
>>     }
>> }
>> 
>> Regards
>> /Fredrik

Mime
View raw message