Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D83D8962C for ; Fri, 2 Dec 2011 05:49:58 +0000 (UTC) Received: (qmail 6641 invoked by uid 500); 2 Dec 2011 05:49:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 6580 invoked by uid 500); 2 Dec 2011 05:49:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 6572 invoked by uid 99); 2 Dec 2011 05:49:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2011 05:49:52 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ey0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Dec 2011 05:49:46 +0000 Received: by eaak10 with SMTP id k10so33274eaa.31 for ; Thu, 01 Dec 2011 21:49:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=mk6pa5l05Xvs2u1VifWcXk83j6Rwb1iZxxEWkEENgl0=; b=q7wAeqc2J9slISpD49Zbd0ZwiDEP9cOsW6duVk+Scjw7lN2rACz2J57LTfdgXDSRau MVKroWGHaebWTmF5VAf8aMrwCSAH0OEiz1FNZqFPlYB4PqmBkYH5+WJuh8s7tDUmhDbS Bwu/T5uDHPxo5EGvC+NTaqweksAbrHHU3GBLc= Received: by 10.213.13.68 with SMTP id b4mr46390eba.49.1322804966176; Thu, 01 Dec 2011 21:49:26 -0800 (PST) MIME-Version: 1.0 Received: by 10.213.28.13 with HTTP; Thu, 1 Dec 2011 21:49:05 -0800 (PST) In-Reply-To: <1F11BA02-CAEB-4DEF-969A-DB0A439630AB@gmail.com> References: <4ED742D7.4040108@sitevision.se> <1F11BA02-CAEB-4DEF-969A-DB0A439630AB@gmail.com> From: Jonathan Ellis Date: Thu, 1 Dec 2011 23:49:05 -0600 Message-ID: Subject: Re: Hinted handoff bug? To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Nope, that's a separate issue. https://issues.apache.org/jira/browse/CASSANDRA-3554 On Thu, Dec 1, 2011 at 5:59 PM, Terje Marthinussen wrote: > Sorry for not checking source to see if things have changed but i just re= membered an issue I have forgotten to make jira for. > > In old days, nodes would periodically try to deliver queues. > > However, this was at some stage changed so it only deliver if a node is b= eing marked up. > > However, you can definitely have a scenario where =A0A fails to deliver t= o B so it send the hint to C instead. > > However, B is not really down, it just could not accept that packet at th= at time and C always (correctly in this case) thinks B is up and it never t= ries to deliver the hints to B. > > Will this change fix this, or do we need to get back the thread that peri= odically tried to deliver hints regardless of node status changes? > > Regards, > Terje > > On 1 Dec 2011, at 19:10, Sylvain Lebresne wrote: > >> You're right, good catch. >> Do you mind opening a ticket on jira >> (https://issues.apache.org/jira/browse/CASSANDRA)? >> >> -- >> Sylvain >> >> On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigb=E4ck >> wrote: >>> Hi, >>> We,re running cassandra 1.0.3. >>> I've done some testing with 2 nodes (node A, node B), replication facto= r 2. >>> I take node A down, writing some data to node B and then take node A up= . >>> Sometimes hints aren't delivered when node A comes up. >>> >>> I've done some debugging in org.apache.cassandra.db.HintedHandOffManage= r and >>> sometimes node B ends up in a strange state in method >>> org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAdd= ress >>> to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliverie= s >>> already has node A in it's Set and therefore no hints will ever be deli= vered >>> to node A. >>> The only reason for this that I can see is that in >>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(Ine= tAddress >>> endpoint) the hintStore.isEmpty() check returns true and the endpoint (= node >>> A) =A0isn't removed from >>> org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no = hints >>> will ever be delivered again until node B is restarted. >>> During what conditions will hintStore.isEmpty() return true? >>> Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} >>> clause, removing the endpoint from queuedDeliveries in the finally bloc= k? >>> >>> public void deliverHints(final InetAddress to) >>> { >>> =A0 =A0 =A0 =A0 logger_.debug("deliverHints to {}", to); >>> =A0 =A0 =A0 =A0 if (!queuedDeliveries.add(to)) >>> =A0 =A0 =A0 =A0 =A0 =A0 return; >>> =A0 =A0 =A0 =A0 ....... >>> } >>> >>> private void deliverHintsToEndpoint(InetAddress endpoint) throws >>> IOException, DigestMismatchException, InvalidRequestException, >>> TimeoutException, >>> { >>> =A0 =A0 =A0 =A0 ColumnFamilyStore hintStore =3D >>> Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF); >>> =A0 =A0 =A0 =A0 if (hintStore.isEmpty()) >>> =A0 =A0 =A0 =A0 =A0 =A0 return; // nothing to do, don't confuse users b= y logging a no-op >>> handoff >>> =A0 =A0 try >>> =A0 =A0 { >>> =A0 =A0 =A0 =A0 ...... >>> =A0 =A0 } >>> =A0 =A0 finally >>> =A0 =A0 { >>> =A0 =A0 =A0 =A0 =A0 =A0 queuedDeliveries.remove(endpoint); >>> =A0 =A0 } >>> } >>> >>> Regards >>> /Fredrik --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com