hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Liochon <nkey...@gmail.com>
Subject Re: hbase compaction stuck
Date Mon, 16 Mar 2015 17:10:24 GMT
If the node is dead, the NoRouteToHostException can happen.
This could be a hdfs or hbase bug (or something else).

For how long to you see the NoRouteToHostException exception?

Basically hbase will try to use that node under hdfs discovers that the
node is stale or dead.
With the default hdfs settings, you should not see it after ~10:30 minutes.
And, if you configured the 'stale node' behavior in hdfs, you should not
see it after ~30 seconds (depending on your config).


On Mon, Mar 16, 2015 at 5:19 PM, Stack <stack@duboce.net> wrote:

> Your networking is broken. Fix the 'java.net.NoRouteToHostException: No
> route to host' exceptions then come back to this list if still issues.
> Yours,
> St.Ack
>
> On Mon, Mar 16, 2015 at 7:54 AM, Chen Song <chen.song.82@gmail.com> wrote:
>
> > We ran a hbase cluser with version 0.98.1+cdh5.1.0 and with auto
> > compaction. I have noticed a few times that compaction stuck under the
> > following circumstances.
> >
> > 1. Some server in the cluster is hard dead and physical down.
> > 2. At the same time, if any region servers are running major compaction
> and
> > requesting data blocks from the dead server. The following exception is
> > seen in region server log.
> >
> > 2015-03-16 03:51:19,621 WARN org.apache.hadoop.hdfs.DFSClient: Failed to
> > connect to /10.0.xx.xx:50010 for block, add to deadNodes and continue.
> > java.net.NoRouteToHostException: No route to host
> > java.net.NoRouteToHostException: No route to host
> >         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >         at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
> >         at
> >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> >         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
> >         at
> > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2765)
> >         at
> >
> >
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:746)
> >         at
> >
> >
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:661)
> >         at
> >
> >
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325)
> >         at
> >
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:566)
> >         at
> >
> >
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:789)
> >         at
> > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836)
> >         at java.io.DataInputStream.read(DataInputStream.java:149)
> >         at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:563)
> >         at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1215)
> >         at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1432)
> >         at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)
> >         at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)
> >         at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
> >         at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:494)
> >         at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:515)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:237)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:176)
> >         at
> > org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1761)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3734)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1950)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1936)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1913)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3068)
> >         at
> >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
> >         at
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
> >
> >
> > 3. After a few tries, the compaction makes no progress and run for hour
> > before it is killed manually.
> > 4. During the time span, that region is unreachable from client. Client
> > always see TimeoutException.
> >
> > Any thoughts on this issue, or work around I can do with this? Any
> feedback
> > is greatly appreciated.
> >
> > Chen
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message