Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 95477 invoked from network); 31 Jan 2011 17:56:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Jan 2011 17:56:34 -0000 Received: (qmail 65539 invoked by uid 500); 31 Jan 2011 17:56:33 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 65108 invoked by uid 500); 31 Jan 2011 17:56:31 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 65099 invoked by uid 99); 31 Jan 2011 17:56:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jan 2011 17:56:30 +0000 X-ASF-Spam-Status: No, hits=4.6 required=5.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.20] (HELO mrout1-b.corp.re1.yahoo.com) (69.147.107.20) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jan 2011 17:56:23 +0000 Received: from sp1-ex07cas03.ds.corp.yahoo.com (sp1-ex07cas03.ds.corp.yahoo.com [216.252.116.151]) by mrout1-b.corp.re1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p0VHssMb022353 for ; Mon, 31 Jan 2011 09:54:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=yahoo-inc.com; s=cobra; t=1296496494; bh=p6JTTbQltD0/bDwbInsUHwtbbuUk1zmohp7e4HvbqR0=; h=From:To:Date:Subject:Message-ID:In-Reply-To:Content-Type: MIME-Version; b=hOywyWV6JM3rW0FhT3NoyiIzjYPRpKrUJmWupAlUm6x58sVxzAMq+6zZtVEJwCQN6 Pjyrg1z4Rz+31sD64+aZoyX41PF/cX9lBttlGt25/1PPFU8B984bc26uChE8WE8Slc fWwhaYdwm/ATC/rhBb4369+L4tZSCu/6IEM80h5o= Received: from SP1-EX07VS01.ds.corp.yahoo.com ([216.252.116.139]) by sp1-ex07cas03.ds.corp.yahoo.com ([216.252.116.151]) with mapi; Mon, 31 Jan 2011 09:54:53 -0800 From: Vidhyashankar Venkataraman To: "user@hbase.apache.org" Date: Mon, 31 Jan 2011 09:54:52 -0800 Subject: Re: Unresponsive master in Hbase 0.90.0 Thread-Topic: Unresponsive master in Hbase 0.90.0 Thread-Index: Acu/QDWkkBUAAP84RoaojxvuOFOB5gABb6CTAIqAawU= Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C96C376CF1D0vidhyashyahooinccom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_C96C376CF1D0vidhyashyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable The Hbase cluster doesn't have the master problems with hadoop-append turne= d on: we will try finding out why it wasn't working with a non-append versi= on of hadoop (with a previous version of hadoop, it was getting stuck while= splitting logs). But there are other issues now (with append turned on) which we are trying = to resolve. The region server that's hosting the META region is getting cho= ked after a table was loaded with around 100 regions per server (this is li= kely the target load that we wanted to have and this worked in 0.89 with th= e same number of nodes and Hbase 0.90 worked fine with 40 nodes and that's = why I started straight with this number). The node can be pinged, but not a= ccessible through ssh and I am unable to perform most hbase operations on t= he cluster as a result. Can the RS hosting META be a potential bottleneck in the system at all? = (I will try shutting down that particular node and see what happens). Vidhya On 1/28/11 3:49 PM, "Vidhyashankar Venkataraman" w= rote: 64 bit Java 1.6. Why is the master even trying to issue a split with an empty log/region in = hand? ( private List splitLog(final FileStatus[] logfiles) ) V On 1/28/11 3:06 PM, "Todd Lipcon" wrote: The 16000 second sleep is really strange... never seen anything like it. What JVM are you running? -Todd On Fri, Jan 28, 2011 at 11:29 AM, Stack wrote: > On Fri, Jan 28, 2011 at 11:23 AM, Vidhyashankar Venkataraman > wrote: > > We are working on trying to fix this (cc'ed Adam as well). > > > >>> Hmm.. maybe before you restart remove the directory > >>> hdfs://b3110120.yst.yahoo.net:4600/hbase/.logs/ completely so no file= s > >>> to be processed on restart. > > > > This one, I had tried during one of the attempts: and it created new lo= gs > directory and still hung at some point which I think was the same point. = (I > will have to dig in to see what exactly happened). > > > > We havent yet looked at that part of the code, but why is the master ev= en > trying to issue a split with an empty log/region in hand? > > > > Can you tar up one of these regionserver dirs and put it somewhere I > can pull? I'll try it over here. > St.Ack > -- Todd Lipcon Software Engineer, Cloudera --_000_C96C376CF1D0vidhyashyahooinccom_--