hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: hbase compaction stuck
Date Mon, 16 Mar 2015 16:19:39 GMT
Your networking is broken. Fix the 'java.net.NoRouteToHostException: No
route to host' exceptions then come back to this list if still issues.
Yours,
St.Ack

On Mon, Mar 16, 2015 at 7:54 AM, Chen Song <chen.song.82@gmail.com> wrote:

> We ran a hbase cluser with version 0.98.1+cdh5.1.0 and with auto
> compaction. I have noticed a few times that compaction stuck under the
> following circumstances.
>
> 1. Some server in the cluster is hard dead and physical down.
> 2. At the same time, if any region servers are running major compaction and
> requesting data blocks from the dead server. The following exception is
> seen in region server log.
>
> 2015-03-16 03:51:19,621 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
> connect to /10.0.xx.xx:50010 for block, add to deadNodes and continue.
> java.net.NoRouteToHostException: No route to host
> java.net.NoRouteToHostException: No route to host
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
>
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>         at
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2765)
>         at
>
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:746)
>         at
>
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:661)
>         at
>
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:566)
>         at
>
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:789)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:563)
>         at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1215)
>         at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1432)
>         at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)
>         at
>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)
>         at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
>         at
>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:494)
>         at
>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:515)
>         at
>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237)
>         at
>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
>         at
>
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)
>         at
>
> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:176)
>         at
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1761)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3734)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1950)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1936)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1913)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3068)
>         at
>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>
>
> 3. After a few tries, the compaction makes no progress and run for hour
> before it is killed manually.
> 4. During the time span, that region is unreachable from client. Client
> always see TimeoutException.
>
> Any thoughts on this issue, or work around I can do with this? Any feedback
> is greatly appreciated.
>
> Chen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message