hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: region server problem
Date Wed, 08 Oct 2008 21:41:54 GMT
You have DEBUG enabled?  Can I see log from the regionserver that went 
down?  Can you tell me more about your cluster? Number of nodes, number 
of regions?  What your uploader looks like (is it a MR job)?  You have 
upped your file descriptors?

Thanks Slava.
St.Ack


Slava Gorelik wrote:
> HI.I'm also encountering error like this.
> I'm using Hbase 0.18.0 an Hadoop 0.18.0.
> I addition to this error, i'm getting that sometimes region servers are
> died, in the log i see region server shutdown, after starting compaction,
> because that some data blocks are not found.
>
> Best Regards.
>
> On Wed, Oct 8, 2008 at 11:29 PM, stack <stack@duboce.net> wrote:
>
>   
>> You should update to 0.2.1 if you can.  Make sure you've upped your file
>> descriptors too:  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also see
>> how to enable DEBUG in same FAQ.
>>
>> Something odd is up when you see messages like this out of HDFS: ': No live
>> nodes contain current block*'.  Thats lost data.
>>
>> Or messages like this, 'compaction completed on region
>> search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that
>> compactions are taking so long -- would seem to indicate your machines are
>> severly overloaded or underpowered or both.  Can you study load when the
>> upload is running on these machines?  Perhaps try  throttling back to see if
>> hbase survives longer?
>>
>> The regionserver will output thread dump in its RPC layer if critical error
>> -- OOME -- or its been hung up for a long time IIRC.
>>
>> Check the '.out' logs too for you hbase install to see if they contain any
>> errors.  Grep the datanode logs too for OOME or "too many open file
>> handles".
>>
>> St.Ack
>>
>> Rui Xing wrote:
>>
>>     
>>> Hi All,
>>>
>>> 1). We are doing performance testing on hbase. The environment of the
>>> testing is 3 data nodes, and 1 name node distributed on 4 machines. We
>>> started one region server on each data node respectively. To insert the
>>> data, one insertion client is started on each data node machine. But as
>>> the
>>> data inserted, the region servers crashed one by one. One of the reasons
>>> is
>>> listed as follows:
>>>
>>> *==>
>>> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
>>> while reading from blk_-806310822584979460 of
>>> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
>>> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
>>>
>>> ... ...
>>>
>>> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
>>> obtain block blk_-806310822584979460 from any node:
>>>  java.io.IOExceptionYou
>>>
>>> 2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>> compaction completed on region search1,r3_1_3_c157476,1223360357528 in
>>> 18mins, 39sec
>>> 2008-10-07 14:52:25,238 INFO
>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>>> regionserver/0.0.0.0:60020.compactor exiting
>>> 2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>> closed search1,r3_1_3_c157476,1223360357528
>>> 2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>> closed -ROOT-,,0
>>> 2008-10-07 14:52:25,291 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
>>> 10.2.6.104:60020
>>> 2008-10-07 14:52:25,291 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
>>> 0.0.0.0:60020 exiting
>>> 2008-10-07 14:52:25,511 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
>>> thread.
>>> 2008-10-07 14:52:25,511 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
>>> complete
>>> ===<
>>>
>>> 2). Another question is, under what circunstance will the region server
>>> print logs of the thread information as below? It appears among the normal
>>> log records.
>>> ===>
>>> 35 active threads
>>> Thread 1281 (IPC Client connection to
>>> d3v1.corp.alimama.com/10.2.6.101:54310
>>> ):
>>>  State: RUNNABLE
>>>  Blocked count: 0
>>>  Waited count: 0
>>>  Stack:
>>>    java.util.Hashtable.remove(Hashtable.java:435)
>>>    org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
>>> ... ...
>>> ===<
>>>
>>> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if
>>> any
>>> clues can be dropped.
>>>
>>> Regards,
>>> -Ray
>>>
>>>
>>>
>>>       
>>     
>
>   


Mime
View raw message