Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 35345 invoked from network); 3 Jul 2009 01:12:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jul 2009 01:12:01 -0000 Received: (qmail 70571 invoked by uid 500); 3 Jul 2009 01:12:11 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 70526 invoked by uid 500); 3 Jul 2009 01:12:11 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 70464 invoked by uid 99); 3 Jul 2009 01:12:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jul 2009 01:12:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jul 2009 01:12:08 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 31365234C004 for ; Thu, 2 Jul 2009 18:11:47 -0700 (PDT) Message-ID: <2072447613.1246583507187.JavaMail.jira@brutus> Date: Thu, 2 Jul 2009 18:11:47 -0700 (PDT) From: "Andrew Purtell (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration) In-Reply-To: <2033816662.1239214212898.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726759#action_12726759 ] Andrew Purtell commented on HBASE-1316: --------------------------------------- The trouble here is ephemeral nodes expiring due to dropped heartbeats. I think this issue should be about solving that problem only. The rest almost does not matter -- java land is blocked anyway, watcher events will queue up. Also, is calling up into java land from JNI while the VM is in a GC cycle safe? It must be. Then I presume if you tried to create an object in the C thread the create would block somehow on an os level mutex until it is safe to create objects again. Would that not defeat the purpose of the C thread in the first place? > ZooKeeper: use native threads to avoid GC stalls (JNI integration) > ------------------------------------------------------------------ > > Key: HBASE-1316 > URL: https://issues.apache.org/jira/browse/HBASE-1316 > Project: Hadoop HBase > Issue Type: Improvement > Affects Versions: 0.20.0 > Reporter: Andrew Purtell > Assignee: Nitay Joffe > Attachments: zk_wrapper.tar.gz > > > From Joey Echeverria up on hbase-users@: > We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable. > We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down. > I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.