Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates
 209.85.160.169 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type
         :content-transfer-encoding;
        b=Tfn+Z9YM9yG4S92rVQ91ewlqV80PJEHNIxzKLj0X6dbXW/CCublI9wLYGutuf3a8AX
         +hiiqkafSshyYfyiGxznb8dcY3cPqD8GdP1vWMsXqQPlgdVTHPWr3zlGWYkik2dbXHlG
         ccBS0r9W6Gx3G3LgsOV2JnpCmjATZdvcbesZo=
MIME-Version: 1.0
Sender: saint.ack@gmail.com
In-Reply-To: <AANLkTilJBMFE1BWXCjUbM7SU9iK7TSnA_2z8Ilo0ygLQ@mail.gmail.com>
References: <AANLkTinYn-djsYwLTXiV4AkOg5hxhZWDV6lSbrDdWXWA@mail.gmail.com>
	<AANLkTimCP2yg2PMUXYmLoEW4_4UuFGpgnJU8pkXY8IvW@mail.gmail.com>
	<AANLkTikNsqzKrqCvxZ2rKqdduMo2K5_fUsFzFRcb0-bs@mail.gmail.com>
	<AANLkTimnvi5sCmyA4k-j_vigWqePOAuY37Wtuq5IbR7Y@mail.gmail.com>
	<AANLkTilJBMFE1BWXCjUbM7SU9iK7TSnA_2z8Ilo0ygLQ@mail.gmail.com>
Date: Wed, 2 Jun 2010 08:25:37 -0700
Message-ID: <AANLkTimbuw1yGYKq__sODYNXfrR_xh2yDLMNCPntz4m-@mail.gmail.com>
Subject: Re: Missing Split (full message)
From: Stack <stack@duboce.net>
To: user@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Wed, Jun 2, 2010 at 2:13 AM, Dan Harvey <dan.harvey@mendeley.com> wrote:
> Yes we're running Cloudera CDH2 which I've just checked includes a
> back ported hdfs-630 patch.
>
> I guess a lot of these issues will be gone once hadoop 0.21 is out and
> hbase can take advantage of the new features.
>

Thats the hope.  A bunch of fixes have gone in for hbase provoked hdfs
issues in 0.20.  Look out for the append branch in hdfs 0.20 coming
soon (It'll be here:
http://svn.apache.org/viewvc/hadoop/common/branches/).  It'll be a
0.20 branch with support for append (hdfs-200, hdfs-142, etc.) and
other fixes needed by hbase.  Thats what the next major hbase will
ship against (CDH3 will include this stuff and then some, if I
understand Todd+crew's plans correctly).

Good on you Dan,
St.Ack


> Thanks,
>
> On 2 June 2010 01:10, Stack <stack@duboce.net> wrote:
>> Hey Dan:
>>
>> On Tue, Jun 1, 2010 at 2:57 AM, Dan Harvey <dan.harvey@mendeley.com> wro=
te:
>>> In what cases would a datanode failure (for example running out of
>>> memory in ourcase) cause HBase data loss?
>>
>> We should just move past the damaged DN on to the other replicas but
>> there are probably places where we can get hungup. =A0Out of interest
>> are you running with hdfs-630 inplace?
>>
>>> Would it mostly only causes dataloss to the meta regions or does it
>>> also cause problems with the actual region files?
>>>
>>
>> HDFS files that had their blocks located on the damaged DN would be
>> susceptible (meta files are just like any other).
>>
>> St.Ack
>>
>>>> On Mon, May 24, 2010 at 2:39 PM, Dan Harvey <dan.harvey@mendeley.com> =
wrote:
>>>>> Hi,
>>>>>
>>>>> Sorry for the multiple e-mails, it seems gmail didn't send my whole
>>>>> message last time! Anyway here it goes again...
>>>>>
>>>>> Whilst loading data via a mapreduce job into HBase I have started get=
ting
>>>>> this error :-
>>>>>
>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>>>> contact region server Some server, retryOnlyOne=3Dtrue, index=3D0,
>>>>> islastrow=3Dfalse, tries=3D9, numtries=3D10, i=3D0, listsize=3D19,
>>>>> region=3Dsource_documents,ipubmed\x219915054,1274525958679 for region
>>>>> source_documents,ipubmed\x219915054,1274525958679, row 'u1012913162',
>>>>> but failed after 10 attempts.
>>>>> Exceptions:
>>>>> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Bat=
ch.process(HConnectionManager.java:1166)
>>>>> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.pro=
cessBatchOfRows(HConnectionManager.java:1247)
>>>>> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609=
)
>>>>>
>>>>> In the master there are the following three regions :-
>>>>>
>>>>> source_documents,ipubmed\x219859228,1274701893687 =A0 =A0 =A0 hadoop1
>>>>> 1825870642 =A0 =A0 =A0ipubmed\x219859228 =A0 =A0 =A0ipubmed\x21991505=
4
>>>>> source_documents,ipubmed\x219915054,1274525958679 =A0 =A0 =A0 hadoop4
>>>>> 193393334 =A0 =A0 =A0 =A0ipubmed\x219915054 =A0 =A0 =A0u102193588
>>>>> source_documents,u102193588,1274486550122 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0hadoop4
>>>>> 2141795358 =A0 =A0 =A0u102193588 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0u105043522
>>>>>
>>>>> and on one of our 5 nodes I found a region which start with
>>>>>
>>>>> ipubmed\x219915054 and ends with u102002564
>>>>>
>>>>> and on another I found the other half of the split which starts with
>>>>>
>>>>> u102002564 and ends with u102193588
>>>>>
>>>>> So it seems that the middle region on the master was split apart but
>>>>> that failed to reach the master.
>>>>>
>>>>> We've had a few problems over the last few days with hdfs nodes
>>>>> failing due to lack of memory which has now been fixed but could have
>>>>> been a cause of this problem.
>>>>>
>>>>> What ways can a split fail to be received by the master and how long
>>>>> would it take for hbase to fix this? I've read it periodically will
>>>>> scan the META table to find problems like this but didn't say how
>>>>> often? It has been about 12h here and our cluster didn't appear to
>>>>> have fixed this missing split, is there a way to force the master to
>>>>> rescan the META table? Will it fix problems like this given time?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> Dan Harvey | Datamining Engineer
>>>>> www.mendeley.com/profiles/dan-harvey
>>>>>
>>>>> Mendeley Limited | London, UK | www.mendeley.com
>>>>> Registered in England and Wales | Company Number 6419015
>>>>>
>>>>
>>>
>>> --
>>> Dan Harvey | Datamining Engineer
>>> www.mendeley.com/profiles/dan-harvey
>>>
>>> Mendeley Limited | London, UK | www.mendeley.com
>>> Registered in England and Wales | Company Number 6419015
>>>
>>
>
> --
> Dan Harvey | Datamining Engineer
> www.mendeley.com/profiles/dan-harvey
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>