Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45AA1C5A0 for ; Wed, 2 May 2012 14:10:59 +0000 (UTC) Received: (qmail 56999 invoked by uid 500); 2 May 2012 14:10:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 56909 invoked by uid 500); 2 May 2012 14:10:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 56899 invoked by uid 99); 2 May 2012 14:10:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2012 14:10:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.217.172] (HELO mail-lb0-f172.google.com) (209.85.217.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2012 14:10:49 +0000 Received: by lbbgo11 with SMTP id go11so542454lbb.31 for ; Wed, 02 May 2012 07:10:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=D33IfHa0wU69mJ7pE0MwzgI6dXKvawCYATiyTc5PRCg=; b=EJZIibN+9VJqY2keonxmHtrYaZddbcwkZ5Q4RsDV/YKlfYj+AilTzyEK8zkPmqo6tE vdVpCPxGLxj5czrX+ZM4LS4UtK4SMMVu8O+a9GvetL0IHYUloK4+8F1+duSvEQS+jPLC AKB3v3KvpuypvEPkovdd7MOrYV5N3hUnpV5ucE0dMjR1ZlOpIPovE+0/bOvQZ9PwisNh a3nlUY3ZcOJ6OzQk/K/2RIiLX9LkGZreI+dFjPDX6tJUdBxH/ZTqeppV9Q/fEb0TUC7n 5wArf3ttfM27hlmmlT6cwpIKIDGFoROGFjXKfB8GMSb+EiyD00OHtpOTx3awdwXOtih4 3iBQ== Received: by 10.152.132.166 with SMTP id ov6mr13118955lab.35.1335967828454; Wed, 02 May 2012 07:10:28 -0700 (PDT) Received: from [192.168.2.50] (81-94-164-42.customer.itmastaren.net. [81.94.164.42]) by mx.google.com with ESMTPS id uc6sm2577768lbb.3.2012.05.02.07.10.27 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 02 May 2012 07:10:27 -0700 (PDT) Message-ID: <4FA14052.50003@sitevision.se> Date: Wed, 02 May 2012 16:10:26 +0200 From: Mikael Wikblom User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110920 SUSE/3.1.15 Thunderbird/3.1.15 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Memtable.flushAndSignal "hangs" ColumnFamilyStore.maybeSwitchMemtable on IOException References: <4F8D60AF.6020702@opera.com> <51532E9A-865D-4CE4-A5F7-D7E6A857130D@thelastpickle.com> <4F912F85.2050404@opera.com> <4FA12BC5.2090101@sitevision.se> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQnfcTP7BbZ6v9MQppET9ujedXY9EegBI9MFhZfLdDVDse8J/byVqz/sG4ingIu9h7+w2Izo ok, just find it a bit hard to be forced to shutdown the node in case of an IOException, but I understand why. The exception occurred because of a missing native snappy library on the server, but the error only occur because we initialized a column family incorrectly (we are using cassandra embedded and are working directly against the internal APIs). Regards Mikael Wikblom On 05/02/2012 03:03 PM, Sylvain Lebresne wrote: > On Wed, May 2, 2012 at 2:42 PM, Mikael Wikblom > wrote: >> Given an IOException in writeSortedContents the latch.countDown() will not >> be called. Wouldn't it be better to place the latch.countDown() in the >> finally statement? > No because having the latch being countDown means 'the sstable has > been flushed successfully and the data can be safely deleted in the > commit log', which is not the case if you get an IOException. > >> We've had issues with IOExceptions in writeSortedContents when doing a snapshot which hung a thread (and still hangs) for 4 days. > It would be interesting to know what triggered the IOException. If > that's due to a bug, then that's the one we should fix in priority. If > that's you running out of disk-space or something like that, you > should probably fix that and restart C*. > > -- > Sylvain -- Mikael Wikblom Software Architect SiteVision AB 019-217058 mikael.wikblom@sitevision.se http://www.sitevision.se