Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC56EC129 for ; Tue, 29 May 2012 14:47:42 +0000 (UTC) Received: (qmail 96001 invoked by uid 500); 29 May 2012 14:47:40 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 95946 invoked by uid 500); 29 May 2012 14:47:40 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Delivered-To: moderator for user@hbase.apache.org Received: (qmail 29079 invoked by uid 99); 29 May 2012 14:25:50 -0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Message-ID: <4FC4DC4E.2080102@free.fr> Date: Tue, 29 May 2012 16:25:18 +0200 From: Cyril Scetbon User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: user@hbase.apache.org X-Ovh-Mailout: 178.32.228.5 (mo5.mail-out.ovh.net) Subject: hosts unreachables Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Ovh-Tracer-Id: 15884195887024686889 X-Ovh-Remote: 83.206.216.177 () X-Ovh-Local: 213.186.33.20 (ns0.ovh.net) X-OVH-SPAMSTATE: OK X-OVH-SPAMSCORE: 0 X-OVH-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeegjedrvdehucetufdoteggodetrfdofgetucfrrhhofhhilhgvmecuqfggjfenuceurghilhhouhhtmecufedttdenucenucfhrhhomhepveihrhhilhcuufgtvghtsghonhcuoegthihrihhlrdhstggvthgsohhnsehfrhgvvgdrfhhrqeenucfjughrpefkfffhfgggvffutgfgsehtjegrtddtfedu X-Spam-Check: DONE|U 0.5/N X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeegjedrvdehucetufdoteggodetrfdofgetucfrrhhofhhilhgvmecuqfggjfenuceurghilhhouhhtmecufedttdenucenucfhrhhomhepveihrhhilhcuufgtvghtsghonhcuoegthihrihhlrdhstggvthgsohhnsehfrhgvvgdrfhhrqeenucfjughrpefkfffhfgggvffutgfgsehtjegrtddtfedu Hi, I've installed hbase on the following configuration : 12 x (rest hbase + regionserver hbase + datanode hadoop) 2 x (zookeeper + hbase master) 1 x (zookeeper + hbase master + namenode hadoop) OS used is ubuntu lucid (10.04) The issue is that when I try to load data using rest api, some hosts become unreachable even if I can ping them. I can no longer connect to them and even monitoring tools can not work during a laps of time. For example, I use SAR on each host and you can see that between 7:10 and 7:35 pm the host does not write any information : 06:45:01 PM all 0.18 0.00 0.37 3.61 0.25 95.58 06:45:01 PM 0 0.24 0.00 0.54 6.62 0.35 92.25 06:45:01 PM 1 0.12 0.00 0.20 0.61 0.15 98.92 06:50:02 PM all 5.69 0.00 1.79 4.23 1.94 86.36 06:50:02 PM 0 5.68 0.00 3.00 7.91 2.21 81.21 06:50:02 PM 1 5.70 0.00 0.59 0.55 1.66 91.51 06:55:01 PM all 0.68 0.00 0.14 1.62 0.23 97.33 06:55:01 PM 0 0.87 0.00 0.20 3.19 0.31 95.44 06:55:01 PM 1 0.49 0.00 0.08 0.05 0.15 99.22 06:58:36 PM all 0.03 0.00 0.02 0.45 0.07 99.43 06:58:36 PM 0 0.01 0.00 0.02 0.40 0.13 99.43 06:58:36 PM 1 0.04 0.00 0.01 0.51 0.00 99.43 07:05:01 PM all 0.03 0.00 0.00 0.10 0.07 99.80 07:05:01 PM 0 0.02 0.00 0.00 0.10 0.10 99.78 07:05:01 PM 1 0.04 0.00 0.01 0.09 0.03 99.83 <--- last measure before host becomes reachable 07:40:07 PM all 14.72 0.00 17.93 0.02 13.31 54.02 <--- new measure after host becomes reachable 07:40:07 PM 0 29.43 0.00 35.87 0.00 26.57 8.13 07:40:07 PM 1 0.00 0.00 0.00 0.04 0.04 99.91 07:45:01 PM all 0.55 0.00 0.25 0.04 0.27 98.89 07:45:01 PM 0 0.54 0.00 0.14 0.05 0.21 99.07 07:45:01 PM 1 0.55 0.00 0.36 0.04 0.33 98.72 07:50:01 PM all 0.11 0.00 0.05 0.18 0.06 99.60 07:50:01 PM 0 0.12 0.00 0.06 0.13 0.09 99.60 07:50:01 PM 1 0.11 0.00 0.04 0.23 0.04 99.59 07:55:01 PM all 0.00 0.00 0.01 0.05 0.07 99.88 07:55:01 PM 0 0.00 0.00 0.01 0.01 0.13 99.84 07:55:01 PM 1 0.00 0.00 0.00 0.08 0.00 99.91 08:05:01 PM all 0.01 0.00 0.00 0.00 0.05 99.94 08:05:01 PM 0 0.00 0.00 0.00 0.00 0.08 99.91 08:05:01 PM 1 0.03 0.00 0.00 0.00 0.01 99.96 I suppose it's caused by a high load but I don't have any proof :( Is there a known bug about that ? I had a similar issue with Cassandra that forced me to upgrade to linux kernel > 3.0 thanks. -- Cyril SCETBON