hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: region server problem
Date Wed, 08 Oct 2008 21:29:16 GMT
You should update to 0.2.1 if you can.  Make sure you've upped your file 
descriptors too:  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also 
see how to enable DEBUG in same FAQ.

Something odd is up when you see messages like this out of HDFS: ': No 
live nodes contain current block*'.  Thats lost data.

Or messages like this, 'compaction completed on region 
search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that 
compactions are taking so long -- would seem to indicate your machines 
are severly overloaded or underpowered or both.  Can you study load when 
the upload is running on these machines?  Perhaps try  throttling back 
to see if hbase survives longer?

The regionserver will output thread dump in its RPC layer if critical 
error -- OOME -- or its been hung up for a long time IIRC.

Check the '.out' logs too for you hbase install to see if they contain 
any errors.  Grep the datanode logs too for OOME or "too many open file 
handles".

St.Ack

Rui Xing wrote:
> Hi All,
>
> 1). We are doing performance testing on hbase. The environment of the
> testing is 3 data nodes, and 1 name node distributed on 4 machines. We
> started one region server on each data node respectively. To insert the
> data, one insertion client is started on each data node machine. But as the
> data inserted, the region servers crashed one by one. One of the reasons is
> listed as follows:
>
> *==>
> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
> while reading from blk_-806310822584979460 of
> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
>
> ... ...
>
> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
> obtain block blk_-806310822584979460 from any node:  java.io.IOExceptionYou
> 2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region search1,r3_1_3_c157476,1223360357528 in
> 18mins, 39sec
> 2008-10-07 14:52:25,238 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver/0.0.0.0:60020.compactor exiting
> 2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> closed search1,r3_1_3_c157476,1223360357528
> 2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> closed -ROOT-,,0
> 2008-10-07 14:52:25,291 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
> 10.2.6.104:60020
> 2008-10-07 14:52:25,291 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
> 0.0.0.0:60020 exiting
> 2008-10-07 14:52:25,511 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
> thread.
> 2008-10-07 14:52:25,511 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> ===<
>
> 2). Another question is, under what circunstance will the region server
> print logs of the thread information as below? It appears among the normal
> log records.
> ===>
> 35 active threads
> Thread 1281 (IPC Client connection to d3v1.corp.alimama.com/10.2.6.101:54310
> ):
>   State: RUNNABLE
>   Blocked count: 0
>   Waited count: 0
>   Stack:
>     java.util.Hashtable.remove(Hashtable.java:435)
>     org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
> ... ...
> ===<
>
> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if any
> clues can be dropped.
>
> Regards,
> -Ray
>
>   


Mime
View raw message