hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Adrien <a...@jeanjean.ch>
Subject Re: Regionserver fails to serve region
Date Mon, 20 Oct 2008 09:38:14 GMT


stack-3 wrote:
> 
> First, see the Jon Gray response.  His postulate that the root of your 
> issues are machines swapping seems likely to me.
> 
> 
> See below for some particular answers to your queries (thanks for the 
> detail).
> 
> Jean-Adrien wrote:
>> The attempts of above can be:
>> 1.
>> java.io.IOException: java.io.IOException: Premeture EOF from inputStream
>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
>>   
> 
> Did you say your disks had filled?  If so, this is likely cause of above 
> (but on our cluster here, we've also been seeing the above and are 
> looking at HADOOP-3831)
> 
> 

Yes one is. 


stack-3 wrote:
> 
>> 2-10
>> java.io.IOException: java.io.IOException: java.lang.NullPointerException
>>         at
>> org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)
>>
>>   
> Was there more stacktrace on this error?  May I see it?  Above should 
> never happen (smile).
> 

Sure. Enjoy. Take in account that it's happen after the above Premeture EOF.


2008-10-14 14:23:55,705 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 60020, call getRow([B@17dc1ef, [B@1474316, null,
9223372036854775807, -1) from 192.168.1.10:49676: error:
java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)
        at
org.apache.hadoop.hbase.HStoreKey$HStoreKeyWritableComparator.compare(HStoreKey.java:593)
        at
org.apache.hadoop.io.MapFile$Reader.seekInternal(MapFile.java:436)
        at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:558)
        at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:541)
        at
org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.getClosest(HStoreFile.java:761)
        at
org.apache.hadoop.hbase.regionserver.HStore.getFullFromMapFile(HStore.java:1179)
        at
org.apache.hadoop.hbase.regionserver.HStore.getFull(HStore.java:1160)
        at
org.apache.hadoop.hbase.regionserver.HRegion.getFull(HRegion.java:1221)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRow(HRegionServer.java:1036)
        at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)


stack-3 wrote:
> 
> 
>> Another 10 attempts scenario I have seen:
>> 1-10:
>> IPC Server handler 3 on 60020, call getRow([B@1ec7483, [B@d54a92, null,
>> 1224105427910, -1) from 192.168.1.11:55371: error: java.io.IOException:
>> Cannot open filename
>> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
>> java.io.IOException: Cannot open filename
>> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1171)
>>
>> Preceded, in concerned regionsserver log, by the line:
>>
>> 2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not
>> obtain block blk_-3759213227484579481_226277 from any node: 
>> java.io.IOException: No live nodes contain current block
>>
>>   
> hdfs is hosed; it lost a block from the named file.  If hdfs is hosed, 
> so is hbase.
> 
> 
>> If I look for this block in the hadoop master log I can find
>>
>> 2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> ask
>> 192.168.1.13:50010 to delete  [...] blk_-3759213227484579481_226277 [...]
>> (many more blocks)
>>   
> 
> This is interesting.  I wonder why hdfs is deleting a block that 
> subsequently a regionserver is trying to use?   Can you correlate the 
> blocks' story with hbase actions?  (Thats probably an unfair question to 
> ask since it would require deep detective work on hbase logs trying to 
> trace the file whose block is missing and its hosting region as it moved 
> around the cluster).
> 
> 

I have noticed no correlation for now. I'll try to play the detective a bit.
If I notice something, I'll post it there.


stack-3 wrote:
> 
> 
> 
>> about 16 min before.
>> In both cases the regionserver fails to serve the concerned region until
>> I
>> restart hbase (not hadoop).
>>
>>   
> Not hadoop?  And if you ran an fsck on the filesystem, its healthy?
> 
> 

Not hadoop. Fsck says it's healthly. 


stack-3 wrote:
> 
> 
>> One last question by the way:
>> Why the replication factor of my hbase files in dfs is 3, when my hadoop
>> cluster is configured to keep only 2 copies ?
>>   
> See http://wiki.apache.org/hadoop/Hbase/FAQ#12.
> 
>> Is it because the default (hadoop-default.xml) config file of the hadoop
>> client, which is embedded in hbase distrib overrides the cluster
>> configuration for the mapfiles created ?
> Yes.
> 
> Thanks for the questions J-A.
> St.Ack
> 
> 

Thank you too.

-- 
View this message in context: http://www.nabble.com/Regionserver-fails-to-serve-region-tp20028553p20066104.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message