hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Harvey <dan.har...@mendeley.com>
Subject Re: Missing Split (full message)
Date Wed, 02 Jun 2010 09:13:31 GMT
Yes we're running Cloudera CDH2 which I've just checked includes a
back ported hdfs-630 patch.

I guess a lot of these issues will be gone once hadoop 0.21 is out and
hbase can take advantage of the new features.

Thanks,

On 2 June 2010 01:10, Stack <stack@duboce.net> wrote:
> Hey Dan:
>
> On Tue, Jun 1, 2010 at 2:57 AM, Dan Harvey <dan.harvey@mendeley.com> wrote:
>> In what cases would a datanode failure (for example running out of
>> memory in ourcase) cause HBase data loss?
>
> We should just move past the damaged DN on to the other replicas but
> there are probably places where we can get hungup.  Out of interest
> are you running with hdfs-630 inplace?
>
>> Would it mostly only causes dataloss to the meta regions or does it
>> also cause problems with the actual region files?
>>
>
> HDFS files that had their blocks located on the damaged DN would be
> susceptible (meta files are just like any other).
>
> St.Ack
>
>>> On Mon, May 24, 2010 at 2:39 PM, Dan Harvey <dan.harvey@mendeley.com> wrote:
>>>> Hi,
>>>>
>>>> Sorry for the multiple e-mails, it seems gmail didn't send my whole
>>>> message last time! Anyway here it goes again...
>>>>
>>>> Whilst loading data via a mapreduce job into HBase I have started getting
>>>> this error :-
>>>>
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>>> contact region server Some server, retryOnlyOne=true, index=0,
>>>> islastrow=false, tries=9, numtries=10, i=0, listsize=19,
>>>> region=source_documents,ipubmed\x219915054,1274525958679 for region
>>>> source_documents,ipubmed\x219915054,1274525958679, row 'u1012913162',
>>>> but failed after 10 attempts.
>>>> Exceptions:
>>>> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1166)
>>>> at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>>>> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>>
>>>> In the master there are the following three regions :-
>>>>
>>>> source_documents,ipubmed\x219859228,1274701893687       hadoop1
>>>> 1825870642      ipubmed\x219859228      ipubmed\x219915054
>>>> source_documents,ipubmed\x219915054,1274525958679       hadoop4
>>>> 193393334        ipubmed\x219915054      u102193588
>>>> source_documents,u102193588,1274486550122                    hadoop4
>>>> 2141795358      u102193588                    u105043522
>>>>
>>>> and on one of our 5 nodes I found a region which start with
>>>>
>>>> ipubmed\x219915054 and ends with u102002564
>>>>
>>>> and on another I found the other half of the split which starts with
>>>>
>>>> u102002564 and ends with u102193588
>>>>
>>>> So it seems that the middle region on the master was split apart but
>>>> that failed to reach the master.
>>>>
>>>> We've had a few problems over the last few days with hdfs nodes
>>>> failing due to lack of memory which has now been fixed but could have
>>>> been a cause of this problem.
>>>>
>>>> What ways can a split fail to be received by the master and how long
>>>> would it take for hbase to fix this? I've read it periodically will
>>>> scan the META table to find problems like this but didn't say how
>>>> often? It has been about 12h here and our cluster didn't appear to
>>>> have fixed this missing split, is there a way to force the master to
>>>> rescan the META table? Will it fix problems like this given time?
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Dan Harvey | Datamining Engineer
>>>> www.mendeley.com/profiles/dan-harvey
>>>>
>>>> Mendeley Limited | London, UK | www.mendeley.com
>>>> Registered in England and Wales | Company Number 6419015
>>>>
>>>
>>
>> --
>> Dan Harvey | Datamining Engineer
>> www.mendeley.com/profiles/dan-harvey
>>
>> Mendeley Limited | London, UK | www.mendeley.com
>> Registered in England and Wales | Company Number 6419015
>>
>

-- 
Dan Harvey | Datamining Engineer
www.mendeley.com/profiles/dan-harvey

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Mime
View raw message