hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Regionserver fails to serve region
Date Fri, 17 Oct 2008 18:18:18 GMT
First, see the Jon Gray response.  His postulate that the root of your 
issues are machines swapping seems likely to me.

See below for some particular answers to your queries (thanks for the 
detail).

Jean-Adrien wrote:
> The attempts of above can be:
> 1.
> java.io.IOException: java.io.IOException: Premeture EOF from inputStream
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
>   

Did you say your disks had filled?  If so, this is likely cause of above 
(but on our cluster here, we've also been seeing the above and are 
looking at HADOOP-3831)

> 2-10
> java.io.IOException: java.io.IOException: java.lang.NullPointerException
>         at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)
>
>   
Was there more stacktrace on this error?  May I see it?  Above should 
never happen (smile).

...

> Another 10 attempts scenario I have seen:
> 1-10:
> IPC Server handler 3 on 60020, call getRow([B@1ec7483, [B@d54a92, null,
> 1224105427910, -1) from 192.168.1.11:55371: error: java.io.IOException:
> Cannot open filename
> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
> java.io.IOException: Cannot open filename
> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
>         at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1171)
>
> Preceded, in concerned regionsserver log, by the line:
>
> 2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not
> obtain block blk_-3759213227484579481_226277 from any node: 
> java.io.IOException: No live nodes contain current block
>
>   
hdfs is hosed; it lost a block from the named file.  If hdfs is hosed, 
so is hbase.


> If I look for this block in the hadoop master log I can find
>
> 2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask
> 192.168.1.13:50010 to delete  [...] blk_-3759213227484579481_226277 [...]
> (many more blocks)
>   

This is interesting.  I wonder why hdfs is deleting a block that 
subsequently a regionserver is trying to use?   Can you correlate the 
blocks' story with hbase actions?  (Thats probably an unfair question to 
ask since it would require deep detective work on hbase logs trying to 
trace the file whose block is missing and its hosting region as it moved 
around the cluster).
> about 16 min before.
> In both cases the regionserver fails to serve the concerned region until I
> restart hbase (not hadoop).
>
>   
Not hadoop?  And if you ran an fsck on the filesystem, its healthy?

> One last question by the way:
> Why the replication factor of my hbase files in dfs is 3, when my hadoop
> cluster is configured to keep only 2 copies ?
>   
See http://wiki.apache.org/hadoop/Hbase/FAQ#12.

> Is it because the default (hadoop-default.xml) config file of the hadoop
> client, which is embedded in hbase distrib overrides the cluster
> configuration for the mapfiles created ?
Yes.

Thanks for the questions J-A.
St.Ack

Mime
View raw message