Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 96170 invoked from network); 2 Jun 2010 15:26:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Jun 2010 15:26:18 -0000 Received: (qmail 10961 invoked by uid 500); 2 Jun 2010 15:26:17 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 10933 invoked by uid 500); 2 Jun 2010 15:26:17 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 10925 invoked by uid 99); 2 Jun 2010 15:26:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jun 2010 15:26:17 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.160.169 as permitted sender) Received: from [209.85.160.169] (HELO mail-gy0-f169.google.com) (209.85.160.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jun 2010 15:26:11 +0000 Received: by gyg4 with SMTP id 4so5095848gyg.14 for ; Wed, 02 Jun 2010 08:25:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=36d7KCQ2BKx27NG2Mjw3EpUMBfFb2PgERx+xLl5gs88=; b=HGL5u+dArf0QIm7GW6k5EZDCwVJrWiiMChFF8lLvmbKh/Ob12e7UuOaJyyaOIkJ5DO ABbUnqWu/QPvR+E/darYAW4PfiU25THhBlQs6VwfJqPkBKy6ls4Z6s+nbxsJ5HrPtqgi wF95jw6aKGU4VinXSC8YcyiLdbpTYTW4ihLR4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=Tfn+Z9YM9yG4S92rVQ91ewlqV80PJEHNIxzKLj0X6dbXW/CCublI9wLYGutuf3a8AX +hiiqkafSshyYfyiGxznb8dcY3cPqD8GdP1vWMsXqQPlgdVTHPWr3zlGWYkik2dbXHlG ccBS0r9W6Gx3G3LgsOV2JnpCmjATZdvcbesZo= MIME-Version: 1.0 Received: by 10.224.72.95 with SMTP id l31mr3496822qaj.194.1275492337186; Wed, 02 Jun 2010 08:25:37 -0700 (PDT) Sender: saint.ack@gmail.com Received: by 10.229.211.2 with HTTP; Wed, 2 Jun 2010 08:25:37 -0700 (PDT) In-Reply-To: References: Date: Wed, 2 Jun 2010 08:25:37 -0700 X-Google-Sender-Auth: 54BGCvKRba_KH03i2AKJNWZocnw Message-ID: Subject: Re: Missing Split (full message) From: Stack To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Jun 2, 2010 at 2:13 AM, Dan Harvey wrote: > Yes we're running Cloudera CDH2 which I've just checked includes a > back ported hdfs-630 patch. > > I guess a lot of these issues will be gone once hadoop 0.21 is out and > hbase can take advantage of the new features. > Thats the hope. A bunch of fixes have gone in for hbase provoked hdfs issues in 0.20. Look out for the append branch in hdfs 0.20 coming soon (It'll be here: http://svn.apache.org/viewvc/hadoop/common/branches/). It'll be a 0.20 branch with support for append (hdfs-200, hdfs-142, etc.) and other fixes needed by hbase. Thats what the next major hbase will ship against (CDH3 will include this stuff and then some, if I understand Todd+crew's plans correctly). Good on you Dan, St.Ack > Thanks, > > On 2 June 2010 01:10, Stack wrote: >> Hey Dan: >> >> On Tue, Jun 1, 2010 at 2:57 AM, Dan Harvey wro= te: >>> In what cases would a datanode failure (for example running out of >>> memory in ourcase) cause HBase data loss? >> >> We should just move past the damaged DN on to the other replicas but >> there are probably places where we can get hungup. =A0Out of interest >> are you running with hdfs-630 inplace? >> >>> Would it mostly only causes dataloss to the meta regions or does it >>> also cause problems with the actual region files? >>> >> >> HDFS files that had their blocks located on the damaged DN would be >> susceptible (meta files are just like any other). >> >> St.Ack >> >>>> On Mon, May 24, 2010 at 2:39 PM, Dan Harvey = wrote: >>>>> Hi, >>>>> >>>>> Sorry for the multiple e-mails, it seems gmail didn't send my whole >>>>> message last time! Anyway here it goes again... >>>>> >>>>> Whilst loading data via a mapreduce job into HBase I have started get= ting >>>>> this error :- >>>>> >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to >>>>> contact region server Some server, retryOnlyOne=3Dtrue, index=3D0, >>>>> islastrow=3Dfalse, tries=3D9, numtries=3D10, i=3D0, listsize=3D19, >>>>> region=3Dsource_documents,ipubmed\x219915054,1274525958679 for region >>>>> source_documents,ipubmed\x219915054,1274525958679, row 'u1012913162', >>>>> but failed after 10 attempts. >>>>> Exceptions: >>>>> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Bat= ch.process(HConnectionManager.java:1166) >>>>> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.pro= cessBatchOfRows(HConnectionManager.java:1247) >>>>> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609= ) >>>>> >>>>> In the master there are the following three regions :- >>>>> >>>>> source_documents,ipubmed\x219859228,1274701893687 =A0 =A0 =A0 hadoop1 >>>>> 1825870642 =A0 =A0 =A0ipubmed\x219859228 =A0 =A0 =A0ipubmed\x21991505= 4 >>>>> source_documents,ipubmed\x219915054,1274525958679 =A0 =A0 =A0 hadoop4 >>>>> 193393334 =A0 =A0 =A0 =A0ipubmed\x219915054 =A0 =A0 =A0u102193588 >>>>> source_documents,u102193588,1274486550122 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0hadoop4 >>>>> 2141795358 =A0 =A0 =A0u102193588 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0u105043522 >>>>> >>>>> and on one of our 5 nodes I found a region which start with >>>>> >>>>> ipubmed\x219915054 and ends with u102002564 >>>>> >>>>> and on another I found the other half of the split which starts with >>>>> >>>>> u102002564 and ends with u102193588 >>>>> >>>>> So it seems that the middle region on the master was split apart but >>>>> that failed to reach the master. >>>>> >>>>> We've had a few problems over the last few days with hdfs nodes >>>>> failing due to lack of memory which has now been fixed but could have >>>>> been a cause of this problem. >>>>> >>>>> What ways can a split fail to be received by the master and how long >>>>> would it take for hbase to fix this? I've read it periodically will >>>>> scan the META table to find problems like this but didn't say how >>>>> often? It has been about 12h here and our cluster didn't appear to >>>>> have fixed this missing split, is there a way to force the master to >>>>> rescan the META table? Will it fix problems like this given time? >>>>> >>>>> Thanks, >>>>> >>>>> -- >>>>> Dan Harvey | Datamining Engineer >>>>> www.mendeley.com/profiles/dan-harvey >>>>> >>>>> Mendeley Limited | London, UK | www.mendeley.com >>>>> Registered in England and Wales | Company Number 6419015 >>>>> >>>> >>> >>> -- >>> Dan Harvey | Datamining Engineer >>> www.mendeley.com/profiles/dan-harvey >>> >>> Mendeley Limited | London, UK | www.mendeley.com >>> Registered in England and Wales | Company Number 6419015 >>> >> > > -- > Dan Harvey | Datamining Engineer > www.mendeley.com/profiles/dan-harvey > > Mendeley Limited | London, UK | www.mendeley.com > Registered in England and Wales | Company Number 6419015 >