Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0FA7C6DDE for ; Tue, 19 Jul 2011 20:19:11 +0000 (UTC) Received: (qmail 29188 invoked by uid 500); 19 Jul 2011 20:19:10 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 29121 invoked by uid 500); 19 Jul 2011 20:19:09 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 29113 invoked by uid 500); 19 Jul 2011 20:19:09 -0000 Delivered-To: apmail-incubator-cassandra-commits@incubator.apache.org Received: (qmail 29110 invoked by uid 99); 19 Jul 2011 20:19:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 20:19:09 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 20:19:06 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 99DFD303; Tue, 19 Jul 2011 20:18:45 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Tue, 19 Jul 2011 20:18:45 -0000 Message-ID: <20110719201845.41039.80699@eos.apache.org> Subject: =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22FAQ=22_by_thepaul?= Auto-Submitted: auto-generated X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for= change notification. The "FAQ" page has been changed by thepaul: http://wiki.apache.org/cassandra/FAQ?action=3Ddiff&rev1=3D123&rev2=3D124 Comment: document what i've found about ubuntu/ec2/jna/memlock "task blocked for mor= e than 120 seconds" problems * [[#seed_spof|Does single seed mean single point of failure?]] * [[#jconsole_array_arg|Why can't I call jmx method X on jconsole? (ex. = getNaturalEndpoints)]] * [[#max_key_size|What's the maximum key size permitted?]] + * [[#ubuntu_ec2_hangs|I'm using Ubuntu on EC2 with JNA, and holy crap we= ird things keep hanging and stalling and printing scary tracebacks in dmesg= !]] = <> = @@ -476, +477 @@ = Routing is O(N) of the key size and querying and updating are O(N log N).= In practice these factors are usually dwarfed by other overhead, but some = users with very large "natural" keys use their hashes instead to cut down t= he size. = + <> + = + =3D=3D I'm using Ubuntu on EC2 with JNA, and holy crap weird things keep = hanging and stalling and blocking and printing scary tracebacks in dmesg! = =3D=3D + = + We have come across several different, but similar, sets of symptoms that= might match what you're seeing. They might all have the same root cause; i= t's not clear. One common piece is messages like this in dmesg: + = + {{{ + INFO: task (some_taskname):(some_pid) blocked for more than 120 seconds. + "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. + }}} + = + It does not seem that anyone has had the time to track this down to the r= eal root cause, but it does seem that upgrading the linux-image-virtual pac= kage and rebooting your instances fixes it. There is likely some bug in sev= eral of the virtual/xen kernel builds distributed by Ubuntu which is fixed = in later versions. Versions of linux-image-*-virtual which are known not to= have this problem include: + = + * linux-image-2.6.38-10-virtual (2.6.38-10.46) (Ubuntu 11.04/Natty Narwh= al) + * linux-image-2.6.35-24-virtual (2.6.35-24.42) (Ubuntu 10.10/Maverick Me= erkat) + = + Uninstalling libjna-java or recompiling Cassandra with CLibrary.tryMlocka= ll()'s mlockall() call commented out also make at least some sorts of this = problem go away, but that's a lot less desirable of a fix. + = + If you have more information on the problem and better ways to avoid it, = please do update this space. +=20