From user-return-26676-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Jun 4 21:23:37 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ED78C921A for ; Mon, 4 Jun 2012 21:23:37 +0000 (UTC) Received: (qmail 30850 invoked by uid 500); 4 Jun 2012 21:23:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 30815 invoked by uid 500); 4 Jun 2012 21:23:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 30758 invoked by uid 99); 4 Jun 2012 21:23:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2012 21:23:33 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a55.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2012 21:23:25 +0000 Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id 5C88312C0B8 for ; Mon, 4 Jun 2012 14:23:04 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=qRaKJBZFmt wmc+ZdmLzVng7UDupcgfLLR36o3B0dz8auin06qbcC5Fg5IbV1lsgAGL0Glbif4t Msp5bdX5sPio2flwFoOfCmXKg9zQSfG0iacFOnYHFXPjfBfAg16ccipwYez6L8TA a7bQv4KTxuo/LLQBME/EJeqCO7mE53+bk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=D31Fl7l1JVJ+wbBw npaAKuIAnN0=; b=b1ZEwXIh/lg1KEhT7XJ4GCsgqa8s7Dc0JN0VitIwT/ipQ4Bw 4GKHRCcaPxi3d56lXfdWpNy9K1fh7sZZMQclsqF9lfS5Hnud+489oqZUTalrYGWd +8sRZ1flCmC/Qd+IQ0XzVxlWo8MrM0PtCWAVE+BlZWNmFZIbO4nmFsSN0Lc= Received: from [192.168.2.189] (unknown [116.90.132.105]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id DEC6812C0AC for ; Mon, 4 Jun 2012 14:23:03 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: multipart/alternative; boundary="Apple-Mail=_04A8E825-9D35-48D0-AFB9-9D1FE5B12885" Subject: Re: Node join streaming stuck at 100% Date: Tue, 5 Jun 2012 09:22:38 +1200 In-Reply-To: To: user@cassandra.apache.org References: <376CEC01195C894CB9F8A3C274029A96BD0731DE@fish-ex2k10-03.azaleos.net> <62BAD014-6065-48CA-9ED6-F82DF245505F@thelastpickle.com> Message-Id: <356870C5-CE24-4850-8A82-B1D59E225C55@thelastpickle.com> X-Mailer: Apple Mail (2.1278) --Apple-Mail=_04A8E825-9D35-48D0-AFB9-9D1FE5B12885 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Are their any errors in the logs about failed streaming ?=20 If you are getting time outs 1.0.8 added a streaming socket timeout = https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L323 Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 3:12 PM, koji wrote: >=20 > aaron morton thelastpickle.com> writes: >=20 >>=20 >> Did you restart ? All good? >> Cheers >>=20 >>=20 >> ----------------- >> Aaron Morton >> Freelance Developer >> aaronmorton >> http://www.thelastpickle.com >>=20 >>=20 >> On 27/04/2012, at 9:49 AM, Bryce Godfrey wrote: >>=20 >> This is the second node I=92ve joined to my cluster in the last few = days, and=20 > so far both have become stuck at 100% on a large file according to = netstats. =20 > This is on 1.0.9, is there anything I can do to make it move on = besides=20 > restarting Cassandra? I don=92t see any errors or warns in logs for=20= > either server, and there is plenty of disk space. >>=20 >> =20 >> On the sender side I see this: >>=20 >> Streaming to: /10.20.1.152 >>=20 >> = /opt/cassandra/data/MonitoringData/PropertyTimeline-hc-80540-Data.db=20 > sections=3D1 progress=3D82393861085/82393861085 - 100% >>=20 >> =20 >> On the node joining I don=92t see this file in netstats, and all = pending=20 > streams are sitting at 0% >>=20 >> =20 >> =20 >=20 >=20 > Hi > we have the same problem (1.0.7) , our netstats log is like this: >=20 > Mode: NORMAL > Streaming to: /1.1.1.1 > = /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3757-Data.db=20= > sections=3D1234 progress=3D3256666/3256666 - 100% > = /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3641-Data.db=20= > sections=3D4386 progress=3D0/1025272214 - 0% > = /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3761-Data.db=20= > sections=3D2956 progress=3D0/17826723 - 0% > = /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3730-Data.db=20= > sections=3D3792 progress=3D0/56066299 - 0% > = /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3760-Data.db=20= > sections=3D4384 progress=3D0/90941161 - 0% > = /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3687-Data.db=20= > sections=3D3958 progress=3D0/54729557 - 0% > = /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3762-Data.db=20= > sections=3D766 progress=3D0/2605165 - 0% > Streaming to: /1.1.1.2 > /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-709-Data.db=20= > sections=3D3228 progress=3D29175698/29175698 - 100% > /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-789-Data.db=20= > sections=3D2102 progress=3D0/618938 - 0% > /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-765-Data.db=20= > sections=3D3044 progress=3D0/1996687 - 0% > /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-788-Data.db=20= > sections=3D2773 progress=3D0/1374636 - 0% > /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-729-Data.db=20= > sections=3D3150 progress=3D0/22111512 - 0% > Nothing streaming from /1.1.1.1 > Nothing streaming from /1.1.1.2 > Pool Name Active Pending Completed > Commands n/a 1 23825242 > Responses n/a 25 19644808 >=20 >=20 > After restart, the pending streams are cleared, but next time we do=20 > "nodetool repair -pr" again, the pending still happened. And this = always=20 > happend on same node(we have total 12 nodes). >=20 > koji >=20 >=20 --Apple-Mail=_04A8E825-9D35-48D0-AFB9-9D1FE5B12885 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 Are = their any errors in the logs about failed streaming = ? 

If you are getting time outs 1.0.8 added a = streaming socket timeout h= ttps://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L323

Cheers

http://www.thelastpickle.com

On 4/06/2012, at 3:12 PM, koji wrote:


aaron morton <aaron <at> thelastpickle.com> = writes:


Did you restart ? All good?
Cheers


-----------------
Aaron Morton
Freelance Developer
<at> aaronmorton
http://www.thelastpickle.com
=


On 27/04/2012, = at 9:49 AM, Bryce Godfrey wrote:

This is the = second node I=92ve joined to my cluster in the last few days, and =
so far both have become stuck at 100% on a large file = according to netstats. 
This is on 1.0.9, is there anything I = can do to make it move on besides
restarting Cassandra?  I = don=92t see any errors or warns in logs for
either server, and there = is plenty of disk space.

 
On the = sender side I see this:

Streaming to: = /10.20.1.152

   = /opt/cassandra/data/MonitoringData/PropertyTimeline-hc-80540-Data.db =
sections=3D1 progress=3D82393861085/82393861085 - = 100%

 
On the = node joining I don=92t see this file in netstats, and all pending =
streams are sitting at 0%

 
 


Hi
we have the same = problem (1.0.7) , our netstats log is like this:

Mode: = NORMAL
Streaming to: /1.1.1.1
=   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3757= -Data.db
  sections=3D1234 progress=3D3256666/3256666 - = 100%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3641= -Data.db
  sections=3D4386 progress=3D0/1025272214 - = 0%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3761= -Data.db
  sections=3D2956 progress=3D0/17826723 - 0%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3730= -Data.db
  sections=3D3792 progress=3D0/56066299 - 0%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3760= -Data.db
  sections=3D4384 progress=3D0/90941161 - 0%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3687= -Data.db
  sections=3D3958 progress=3D0/54729557 - 0%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3762= -Data.db
  sections=3D766 progress=3D0/2605165 - = 0%
Streaming to: /1.1.1.2
=   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-709-Da= ta.db
  sections=3D3228 progress=3D29175698/29175698 - = 100%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-789-Da= ta.db
  sections=3D2102 progress=3D0/618938 - 0%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-765-Da= ta.db
  sections=3D3044 progress=3D0/1996687 - 0%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-788-Da= ta.db
  sections=3D2773 progress=3D0/1374636 - 0%
=   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-729-Da= ta.db
  sections=3D3150 progress=3D0/22111512 - 0%
= Nothing streaming from /1.1.1.1
Nothing streaming from = /1.1.1.2
Pool Name =             &n= bsp;      Active   Pending =      Completed
Commands =             &n= bsp;          n/a =         1 =       23825242
Responses =             &n= bsp;         n/a =        25 =       19644808


After restart, = the pending streams are cleared, but next time we do
"nodetool = repair -pr" again, the pending still happened. And this always =
happend on same node(we have total 12 = nodes).

koji



<= /html>= --Apple-Mail=_04A8E825-9D35-48D0-AFB9-9D1FE5B12885--