Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 31C4F109F0 for ; Mon, 30 Sep 2013 07:45:27 +0000 (UTC) Received: (qmail 76362 invoked by uid 500); 30 Sep 2013 07:44:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 75462 invoked by uid 500); 30 Sep 2013 07:44:39 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 75452 invoked by uid 99); 30 Sep 2013 07:44:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Sep 2013 07:44:37 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.192.171] (HELO mail-pd0-f171.google.com) (209.85.192.171) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Sep 2013 07:44:33 +0000 Received: by mail-pd0-f171.google.com with SMTP id g10so5294799pdj.2 for ; Mon, 30 Sep 2013 00:44:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=A+8RY1Hywbt/CTyRrYMZAHd1cLKecc0N9DadlqUuTQ4=; b=OpR/SLF7b96nWFKNivvTjorXxyY74uJO9vcsWlvBrT3yy+yEKunEf2lZAuotlbgWSv LhrPT8QFiURBfpMF6UeTsGHHMcvDHEGV2MM2jZV/AcWA4bdhlFK/3ShtRE/lTXGKfaAz wLKzrJxJvntJwkK/DH7K0lXEWJ3/JDGcA6h+T+n1m81wcm+UKLvVmNXw5SEXHThdzeiy 2Pd1/Jadg400VtzzSnRa+hOvqNsPCNXWJZfjqFOz4wum5e8UeV0xCylize90zoIqwN0J v7dnHGrzpyVPIuf3+o7+4hN5AwacpQkQAhUs+nemZluHAG4cguqo6Ak8NA4lB4mPij2P GRMg== X-Gm-Message-State: ALoCoQkOuZMBUhQRO53dhSszQ6JpC2MdkBTK2TIFw3Q5lMwn32EPuJXCRB7OJzaF6uHoAe7u3NoA X-Received: by 10.67.23.199 with SMTP id ic7mr26821433pad.73.1380527052257; Mon, 30 Sep 2013 00:44:12 -0700 (PDT) Received: from [172.16.1.18] ([203.86.207.101]) by mx.google.com with ESMTPSA id iu7sm25024462pbc.45.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 30 Sep 2013 00:44:11 -0700 (PDT) From: Aaron Morton Content-Type: multipart/alternative; boundary="Apple-Mail=_1EEB67BF-1842-46D8-A6B6-ABE8308E54EF" Message-Id: <9839670A-E34B-4D1D-A6DA-D191C10B58FC@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: HintedHandoff process does not finish Date: Mon, 30 Sep 2013 20:44:08 +1300 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1510) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_1EEB67BF-1842-46D8-A6B6-ABE8308E54EF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > What can be the reason for the handoff process not to finish? Check for other errors about timing out during hint reply.=20 > What would be the best way to recover from this situation? If they are really causing trouble drop the hints via = HintedHandoffManager JMX MBean or stopping the node and deleting the = files on disk. Then use repair later.=20 > What can be done to prevent this from happening again? Hints are stored when either the node is down before the request starts = or when the coordinator times out waiting for the remote node. Check the = logs for nodes going down, and check the MessagingService MBean for = TimedOuts from other nodes. This may indicate issues with a cross DC = connection.=20 Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 27/09/2013, at 11:18 PM, Tom van den Berge wrote: > Hi, >=20 > One one of my nodes, the (storage) load increased dramatically = (doubled), within one or two hours. The hints column family was causing = the growth. I noticed one HintedHandoff process that was started some = two hours ago, but hadn't finished. Normally, these processes take only = a few seconds, 15 seconds max, in my cluster. >=20 > The not-finishing process was handing the hints over to a host in = another data center. There were no warning or error messages in the = logs, other than the repeated "flushing high-traffic column family = hints". > I'm using Cassandra 1.2.3. > What can be the reason for the handoff process not to finish? > What would be the best way to recover from this situation? > What can be done to prevent this from happening again? >=20 > Thanks in advance, > Tom >=20 >=20 >=20 >=20 >=20 --Apple-Mail=_1EEB67BF-1842-46D8-A6B6-ABE8308E54EF Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
  • What can be the = reason for the handoff process not to = finish?
Check for other errors = about timing out during hint reply. 

  • What would be the best way to = recover from this situation?
If = they are really causing trouble drop the hints via HintedHandoffManager = JMX MBean or stopping the node and deleting the files on disk. Then use = repair later. 

  • What can be done to prevent this = from happening again?
Hints are = stored when either the node is down before the request starts or when = the coordinator times out waiting for the remote node. Check the logs = for nodes going down, and check the MessagingService MBean for TimedOuts = from other nodes. This may indicate issues with a cross DC = connection. 

Cheers

http://www.thelastpickle.com

On 27/09/2013, at 11:18 PM, Tom van den Berge <tom@drillster.com> = wrote:

Hi,

One one = of my nodes, the (storage) load increased dramatically (doubled), within = one or two hours. The hints column family was causing the growth. I = noticed one HintedHandoff process that was started some two hours ago, = but hadn't finished. Normally, these processes take only a few seconds, = 15 seconds max, in my cluster.

The not-finishing process was handing the hints over = to a host in another data center. There were no warning or error = messages in the logs, other than the repeated "flushing high-traffic = column family hints".
I'm using Cassandra 1.2.3.
  • What can be the reason = for the handoff process not to finish?
  • What would be the = best way to recover from this situation?
  • What can be done to = prevent this from happening again?

Thanks in = advance,
Tom






= --Apple-Mail=_1EEB67BF-1842-46D8-A6B6-ABE8308E54EF--