Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B3924D46 for ; Thu, 16 Jun 2011 01:07:09 +0000 (UTC) Received: (qmail 9055 invoked by uid 500); 16 Jun 2011 01:07:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 9030 invoked by uid 500); 16 Jun 2011 01:07:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 9022 invoked by uid 99); 16 Jun 2011 01:07:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 01:07:07 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 01:07:02 +0000 Received: by wyb29 with SMTP id 29so791883wyb.31 for ; Wed, 15 Jun 2011 18:06:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=7T+mNAl91r9QZtxT2e9iWF09eYWKe33k1cGN1BKtA1Q=; b=fHKamFcpNIvrtFErVQftIqJbbkwGmMnsDqCdBhAUvkMA/BNe4a0amNSt7J+DrhE84h fHmrcHx5PtaOrAL1N3bpXVSucQumwgOUxnoa0S97W0q6QoC1J3u4JPMfoNt5SnwWKPUM 0h68gMSHOMrDlA7xsSIM4BcmvGbdszrizw6l8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=KQdC0GXFuq6nK76xHXU82IOvAA7n7ioVonk6kukKVWsStJgSM1JgnFaVeMudOCwYDN rsSOTjwyrdC/o1xAlu562dZuPXSGpJNtmxHvLYHCcrXZ6eHm8vEv1C0GOZgzqvsp+sJO qYp9fywcV84vpd7lzItug5wPM1XceRGb5Uu7Q= Received: by 10.216.143.74 with SMTP id k52mr271179wej.0.1308186399151; Wed, 15 Jun 2011 18:06:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.89.70 with HTTP; Wed, 15 Jun 2011 18:06:19 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Wed, 15 Jun 2011 20:06:19 -0500 Message-ID: Subject: Re: What triggers hint delivery? To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable You're right, those could all cause what you are seeing. We used to have a "re-check hourly" scheduled task, but took it out because it was very very performance intensive -- at the time, hints were not stored by machine so asking "does machine X have any hints" required scanning all hints. Should be fine to add that back now. On Wed, Jun 15, 2011 at 7:48 PM, Terje Marthinussen wrote: > I suspect a few possibilities: > 1. I have not checked, but what happens (in terms of hint delivery) if a > node tries to write something but the write times out even if the node is > marked as up? > 2. I would assume there can be ever so slight variations in how different > nodes in the cluster think the rest of the cluster is up. These events wi= ll > of course typically=A0 be short lived (unless some sort of long term spli= t > brain situation occurs), but if you are writing data while for instance a > node is restarting, I would not be surprised if there are race conditions > where A see B as down, sends a hint to C but C already think B is up > 3. I have observed situations where it seems like a node comes in up stat= e > but for some reason takes a while to get really operational. Hint deliver= y > fails, the hint sender gives up and nothing more happens. > > May be an idea to let a node check if it has hints on heartbeats maybe > (potentially not all of them, but at a regular interval)? > > Terje > > On Thu, Jun 16, 2011 at 2:08 AM, Jonathan Ellis wrote= : >> >> On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen >> wrote: >> > I was looking quickly at source code tonight. >> > As far as I could see from a quick code scan, hint delivery is only >> > triggered as a state change from a node is down to when it enters up >> > state? >> >> Right. >> >> > If this is indeed the case, it would potentially explain why we >> > sometimes >> > have hints on machines which does not seem to get played back >> >> Why is that? =A0Hints don't get created in the first place unless a node >> is in the down state. >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com