Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1236C77BB for ; Thu, 1 Dec 2011 23:59:45 +0000 (UTC) Received: (qmail 38854 invoked by uid 500); 1 Dec 2011 23:59:43 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38803 invoked by uid 500); 1 Dec 2011 23:59:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38795 invoked by uid 99); 1 Dec 2011 23:59:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 23:59:42 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tmarthinussen@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 23:59:35 +0000 Received: by iaek3 with SMTP id k3so767518iae.31 for ; Thu, 01 Dec 2011 15:59:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=iuCd2ggDQQ/18pK7XNKbbO4OIBBeZT5UXc6+3MNzm5c=; b=lqeM6CYGq+QTOni9djkT4zl/glFcDTCOowkOce8pO8ztdmhqe/K/gVFhDwPwZDa84V cmaxBzEWoB3p3yuDkPN48dG/D6zErDVoUJ4JQjJvL4etVfdt45kUXdLgx77DyS91XDAI wabPiNjT1Ul9taePQ0j5/TZb5CGniAwTXa7pU= Received: by 10.43.131.196 with SMTP id hr4mr10765030icc.55.1322783954836; Thu, 01 Dec 2011 15:59:14 -0800 (PST) Received: from [126.161.139.25] (pw126161139025.61.tik.panda-world.ne.jp. [126.161.139.25]) by mx.google.com with ESMTPS id g16sm27025549ibs.8.2011.12.01.15.59.11 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 01 Dec 2011 15:59:13 -0800 (PST) References: <4ED742D7.4040108@sitevision.se> In-Reply-To: Mime-Version: 1.0 (1.0) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Message-Id: <1F11BA02-CAEB-4DEF-969A-DB0A439630AB@gmail.com> Cc: "user@cassandra.apache.org" X-Mailer: iPhone Mail (9A405) From: Terje Marthinussen Subject: Re: Hinted handoff bug? Date: Fri, 2 Dec 2011 08:59:03 +0900 To: "user@cassandra.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org Sorry for not checking source to see if things have changed but i just remem= bered an issue I have forgotten to make jira for. In old days, nodes would periodically try to deliver queues. However, this was at some stage changed so it only deliver if a node is bein= g marked up. However, you can definitely have a scenario where A fails to deliver to B s= o it send the hint to C instead. However, B is not really down, it just could not accept that packet at that t= ime and C always (correctly in this case) thinks B is up and it never tries t= o deliver the hints to B. Will this change fix this, or do we need to get back the thread that periodi= cally tried to deliver hints regardless of node status changes? Regards, Terje On 1 Dec 2011, at 19:10, Sylvain Lebresne wrote: > You're right, good catch. > Do you mind opening a ticket on jira > (https://issues.apache.org/jira/browse/CASSANDRA)? >=20 > -- > Sylvain >=20 > On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigb=C3=A4ck > wrote: >> Hi, >> We,re running cassandra 1.0.3. >> I've done some testing with 2 nodes (node A, node B), replication factor 2= . >> I take node A down, writing some data to node B and then take node A up. >> Sometimes hints aren't delivered when node A comes up. >>=20 >> I've done some debugging in org.apache.cassandra.db.HintedHandOffManager a= nd >> sometimes node B ends up in a strange state in method >> org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddre= ss >> to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries >> already has node A in it's Set and therefore no hints will ever be delive= red >> to node A. >> The only reason for this that I can see is that in >> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetA= ddress >> endpoint) the hintStore.isEmpty() check returns true and the endpoint (no= de >> A) isn't removed from >> org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hi= nts >> will ever be delivered again until node B is restarted. >> During what conditions will hintStore.isEmpty() return true? >> Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} >> clause, removing the endpoint from queuedDeliveries in the finally block?= >>=20 >> public void deliverHints(final InetAddress to) >> { >> logger_.debug("deliverHints to {}", to); >> if (!queuedDeliveries.add(to)) >> return; >> ....... >> } >>=20 >> private void deliverHintsToEndpoint(InetAddress endpoint) throws >> IOException, DigestMismatchException, InvalidRequestException, >> TimeoutException, >> { >> ColumnFamilyStore hintStore =3D >> Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF); >> if (hintStore.isEmpty()) >> return; // nothing to do, don't confuse users by logging a no= -op >> handoff >> try >> { >> ...... >> } >> finally >> { >> queuedDeliveries.remove(endpoint); >> } >> } >>=20 >> Regards >> /Fredrik