incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Ng <buzzt...@gmail.com>
Subject Re: user Digest of: get.23021
Date Thu, 26 Apr 2012 18:21:04 GMT
I am having the same issue in 1.0.7 with leveled compation.  It seems that
the repair is flaky.  It either completes relatively fast in a TEST
environment (7 minutes) or gets stuck trying to receive a merkle tree from
a peer that is already sending it the merkle tree.

Only solution is to restart cassandra.  But, we that's not good.

On Thu, Apr 26, 2012 at 2:12 PM, <user-help@cassandra.apache.org> wrote:

>
> user Digest of: get.23021
>
> Topics (messages 23021 through 23021)
>
> repair waiting for something
>        23021 by: Igor
>
>
>
> Return-Path: <buzztemk@gmail.com>
> Received: (qmail 18382 invoked by uid 99); 26 Apr 2012 18:12:10 -0000
> Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:10
> +0000
> X-ASF-Spam-Status: No, hits=1.5 required=5.0
>        tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS
> X-Spam-Check-By: apache.org
> Received-SPF: pass (nike.apache.org: domain of buzztemk@gmail.comdesignates
> 209.85.213.44 as permitted sender)
> Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com)
> (209.85.213.44)
>    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:03
> +0000
> Received: by yhkk25 with SMTP id k25so1353248yhk.31
>        for <user-get.23021@cassandra.apache.org>; Thu, 26 Apr 2012
> 11:11:42 -0700 (PDT)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>        d=gmail.com; s=20120113;
>        h=mime-version:date:message-id:subject:from:to:content-type;
>        bh=r9z+JIAEkTfLo/8PFQJjtEFfJbNxrmswWgqgBxX7sGs=;
>        b=SqDIdsaA/YBsIb8yTAjwlLyz/3KvP2fJzedX1lywPYnAT698AbE2yGI30qpGo8rUQM
>
> q/QFJ5mFNQkdrn0Ghr6L+wKe+slq6Teb8C/feeHBU9BkjbaAY40UPPljJyf/L0Yr9Sp8
>
> ryso93dpcgcC18DdwbAPHmxd0C9G20gf4dbQcpquAKgyxtTK849GQXpPICS4AUHlG2bL
>
> OY83kIzRIBv7g3Zy2SJALwYX9eeB6zGin0DbnrtgGr7IqI0LBscWv6eKNMS658twLGG+
>
> 37cVt+Wmtf5QIIT/Jm2qUdBZ7NViwwlnkJL79ULGnesj4Hewp2npFQAmLypK+8fGqoAM
>         ie9Q==
> MIME-Version: 1.0
> Received: by 10.182.113.106 with SMTP id
> ix10mr10045510obb.26.1335463902287;
>  Thu, 26 Apr 2012 11:11:42 -0700 (PDT)
> Received: by 10.60.143.102 with HTTP; Thu, 26 Apr 2012 11:11:42 -0700 (PDT)
> Date: Thu, 26 Apr 2012 14:11:42 -0400
> Message-ID: <
> CAAL7ocAVUW1RtaqWLdDZbNzOSV7-QXqFHOT7w6uJ8Q08M03CUw@mail.gmail.com>
> Subject: Get
> From: Frank Ng <buzztemk@gmail.com>
> To: user-get.23021@cassandra.apache.org
> Content-Type: multipart/alternative; boundary=f46d0447f3b081982d04be98eb8e
>
>
> ----------------------------------------------------------------------
>
>
>  Hi,
>
> 10 nodes cassandra 1.0.3, several DC. weekly nodetool repair stuck for
> unusual long time for node 10.254.237.2.
>
> output log on this node:
>  INFO 11:19:42,045 Starting repair command #1, repairing 5 ranges.
>  INFO 11:19:42,053 [repair #040aae00-28a1-11e1-0000-e378018944ff] new
> session: will sync *localhost/10.254.237.2, /10.254.221.2, /10.253.2.2, /
> 10.254.217.2, /10.254.94.2* on range
> (85070591730234615865843651857942052864,85070591730234615865843651857942052865]
> for meter.[eventschema, schema, ids, transaction]
>  INFO 11:19:42,055 [repair #040aae00-28a1-11e1-0000-e378018944ff] requests
> for merkle tree sent for eventschema (to [/10.253.2.2, /10.254.221.2,
> localhost/10.254.237.2, /10.254.217.2, /10.254.94.2])
>  INFO 11:19:42,063 Enqueuing flush of Memtable-eventschema@1509399856(18748/23435
> serialized/live bytes, 4 ops)
>  INFO 11:19:42,063 Writing Memtable-eventschema@1509399856(18748/23435
> serialized/live bytes, 4 ops)
>  INFO 11:19:42,072 Completed flushing
> /spool1/cassandra/data/meter/eventschema-hb-40-Data.db (4745 bytes)
>  INFO 11:19:42,073 Discarding obsolete commit
> log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1324019623060.log)
>  INFO 11:19:42,076 [repair #040aae00-28a1-11e1-0000-e378018944ff] Received
> merkle tree for eventschema from localhost/10.254.237.2
>  INFO 11:19:42,102 [repair #040aae00-28a1-11e1-0000-e378018944ff] Received
> merkle tree for eventschema from /10.254.221.2
>  INFO 11:19:42,128 [repair #040aae00-28a1-11e1-0000-e378018944ff] Received
> merkle tree for eventschema from /10.254.217.2
>  INFO 11:19:42,228 [repair #040aae00-28a1-11e1-0000-e378018944ff] Received
> merkle tree for eventschema from /10.253.2.2
>
> And nothing after that for long time. So node sent request for trees to
> other nodes and received all but from the 10.254.94.2*
>
> *On that 10.254.94.2 node:
> INFO 11:19:42,083 [repair #040aae00-28a1-11e1-0000-e378018944ff] Sending
> completed merkle tree to /10.254.237.2 for (meter,eventschema)
>
> So merkle tree were lost somewhere. Will this waiting break somehow or I
> need to restart node?
>
>

Mime
View raw message