Return-Path: Delivered-To: apmail-hadoop-core-commits-archive@www.apache.org Received: (qmail 48072 invoked from network); 6 May 2009 16:59:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 May 2009 16:59:59 -0000 Received: (qmail 45356 invoked by uid 500); 6 May 2009 16:59:59 -0000 Delivered-To: apmail-hadoop-core-commits-archive@hadoop.apache.org Received: (qmail 45304 invoked by uid 500); 6 May 2009 16:59:59 -0000 Mailing-List: contact core-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-commits@hadoop.apache.org Received: (qmail 45295 invoked by uid 99); 6 May 2009 16:59:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2009 16:59:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2009 16:59:57 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 31ED1118BF for ; Wed, 6 May 2009 16:59:37 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: core-commits@hadoop.apache.org Date: Wed, 06 May 2009 16:59:36 -0000 Message-ID: <20090506165936.9134.52403@eos.apache.org> Subject: [Hadoop Wiki] Update of "ZooKeeper/Troubleshooting" by PatrickHunt X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The following page has been changed by PatrickHunt: http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting ------------------------------------------------------------------------------ ZooKeeper is a canary in a coal mine of sorts. Because of the heart-beating performed by the clients and servers ZooKeeper based applications are very sensitive to things like network and system latencies. We often see client disconnects and session expirations associated with these types of problems. Take a look at [http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems this section] to start. + + == Client disconnects due to client side swapping == + + [http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems This link] specifically discusses the negative impact of swapping in the context of the server. However this can be an issue for clients as well. Swapping will delay, or potentially even stop for a significant period, the heartbeats from client to server, resulting in session expirations. + + As told by a user: + + "This [https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706402#action_12706402 issue] is clearly linked to heavy utilization or swapping on the clients. I find that if I keep the clients from swapping that this error materializes relatively infrequently, and when it does materialize it is linked to a sudden increase in load. For example, the concurrent start of 100 clients on 14 machines will sometimes trigger this issue. <...> All in all, it is my sense that Java processes must avoid swapping if they want to have not just timely but also reliable behavior." + === Hardware misconfiguration - NIC ===