Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 64282 invoked from network); 10 Dec 2008 18:35:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Dec 2008 18:35:57 -0000 Received: (qmail 85602 invoked by uid 500); 10 Dec 2008 18:36:09 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 85247 invoked by uid 500); 10 Dec 2008 18:36:08 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 85236 invoked by uid 99); 10 Dec 2008 18:36:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Dec 2008 10:36:08 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [63.203.238.117] (HELO dns.duboce.net) (63.203.238.117) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Dec 2008 18:35:54 +0000 Received: by dns.duboce.net (Postfix, from userid 1008) id BDF70C51D; Wed, 10 Dec 2008 09:04:32 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-26) on dns.duboce.net X-Spam-Level: Received: from durruti.desk.hq.powerset.com (durruti.desk.hq.powerset.com [208.84.6.60]) by dns.duboce.net (Postfix) with ESMTP id DB4E7C256 for ; Wed, 10 Dec 2008 09:04:30 -0800 (PST) Message-ID: <49400BF2.3070702@duboce.net> Date: Wed, 10 Dec 2008 10:35:30 -0800 From: stack User-Agent: Thunderbird 2.0.0.18 (Macintosh/20081105) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: Re: Sometimes hbase do not handle any request at all References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.4 Which hbase Yunqing version? When it is hung, can you see which region it fails on (Should be in the exception when the reduce fails). Can you see why the reduce fails? Is it the TaskTracker timing it out after 10 minutes or is it timing out on a particular hbase region. If you can figure the region, see which server its hosted on (Use UI or master logs to figure this). Then go to that server, tail its logs. Can you figure what its doing? Is it stuck? Thread dump it a few times and see if you can see where its blocked -- you can thread dump the server from the UI. When you thread dump via the UI, it also outputs into the hbase regionserver log. Post them to this list if you'd like us to look at them for you. Thanks, St.Ack Zhou, Yunqing wrote: > I'm using IdentityTableReducer to insert about 1M records into hbase > on a 23 machines cluster. > but I found that sometimes it got junked. > everything suspended. all machine's work load are zeros. > > then some reducer failed, new reducer begin to insert some records. > then the pheomenon appeared again. > > I've set all machines nofile limit to 32768. > Do you know that's why? > Thanks. >