hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBase Exceptions on version 0.20.1
Date Mon, 09 Nov 2009 19:18:05 GMT
I ran your program elsif against our cluster.  It ran for about two days,
loaded up 1500 regions into 3-node cluster and then each of the RS's
variously timed out against zookeeper and shut themselves down.   I'll take
a look at why the session timeout.   The upload is pretty slow.  My guess is
that the client with its 30 threads is contending over the map of region
locations (Ryan noted the last day that all writes are blocked while the
search for new region location is being done).  I can take a look into this
too.

St.Ack


On Tue, Oct 20, 2009 at 2:12 PM, elsif <elsif.then@gmail.com> wrote:

>
> Hello Everyone,
>
> We are experiencing many HBase exceptions while running the attached
> test program using
> HBase 0.20.1 and Hadoop 0.20.1 (r810220) releases on a Hadoop cluster
> with 14 data nodes.
>
> The HBase operational performance degrades quickly once certain error
> conditions are
> encountered especially if there are issues with the META region.
>
> Has anyone else seen these exceptions listed below?
>
> We ran the test with 3 region servers and then with 14 region servers.
> In both cases, we still got exceptions.
>
> In each test run, we shutdown HBase, removed the main /hbase
> folder, restarted HBase and then ran the test program.
> The test program uses a table named "fc_test". The program is started by
> the following command:
>
> /opt/hbase/bin/hbase org.apache.hadoop.hbase.util.TableTest  30  4000000
>
> The attached HBase test program starts 30 threads. Each thread loops,
> inserting randomly generated keys
> along with a fixed data pattern of 4,000,000 bytes.
>
> After running the test for 2-12 hours, the table scanning of "fc_test"
> is broken and only a limited number of put
> operations are still succeeding.
>
>
>
> EXCEPTION LIST
>
> ------------------------------
> 2009-10-15 03:28:39,005 DEBUG
> org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered
> on store data; time since last major compaction 89677908ms
> 2009-10-15 03:28:39,012 DEBUG
> org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4
> file(s)  into /hbase/fc_test/compaction.dir/1073627844, seqid=396374
> 2009-10-15 03:28:39,245 INFO org.apache.hadoop.hdfs.DFSClient: Could not
> obtain block blk_-6054029382496843671_324379 from any node:
> java.io.IOException: No live nodes contain current block
> ------------------------------
> 2009-10-15 07:41:43,426 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> response for sessionid:0x2244f6194a5000d after 1ms
> 2009-10-15 07:41:53,993 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x2244f6194a5000d to sun.nio.ch.SelectionKeyImpl@139f5ca
> <mailto:sun.nio.ch.SelectionKeyImpl@139f5ca>
> java.io.IOException: TIMED OUT
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> 2009-10-15 07:41:54,344 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: Disconnected, type: None, path: null
> 2009-10-15 07:41:55,084 INFO org.apache.zookeeper.ClientCnxn: Attempting
> connection to server hfs-029014/10.1.29.14:2181
> 2009-10-15 07:41:55,085 INFO org.apache.zookeeper.ClientCnxn: Priming
> connection to java.nio.channels.SocketChannel[connected
> local=/10.1.29.19:39421 remote=hfs-029014/10.1.29.14:2181]
> ------------------------------
> 2009-10-15 08:37:40,982 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 6 on 60020 caught: java.nio.channels.ClosedChannelException
>    at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
>    at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125)
>    at
>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615)
>    at
>
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943)
> ------------------------------
> 2009-10-15 07:57:28,025 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=4.652237MB (4878224), Free=987.6603MB (1035636848), Max=992.3125MB
> (1040515072), Counts: Blocks=0, Access=51003, Hit=0, Miss=51003,
> Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
> Evicted/Run=NaN
> 2009-10-15 07:57:28,189 INFO org.apache.hadoop.hdfs.DFSClient: Exception
> in createBlockOutputStream java.io.EOFException
> 2009-10-15 07:57:28,190 INFO org.apache.hadoop.hdfs.DFSClient:
> Abandoning block blk_7166000434569842394_332632
> 2009-10-15 07:57:28,306 WARN org.apache.hadoop.hbase.regionserver.HLog:
> IPC Server handler 5 on 60020 took 2297ms appending an edit to hlog;
> editcount=146
> -----------------------------
> 2009-10-15 09:10:00,559 INFO org.apache.zookeeper.ClientCnxn: Exception
> while closing send thread for session 0x1244f6194780008 : Read error rc
> = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
> 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn:
> Disconnecting ClientCnxn for session: 0x1244f6194780008
> 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> ------------------------------
> 2009-10-13 12:30:58,207 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read
> master address from ZooKeeper. Retrying. Error was:
> java.io.IOException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /hbase/master
>    at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331)
>    at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1338)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1370)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:427)
>    at java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /hbase/master
>    at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>    at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892)
>    at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327)
> ------------------------------
> 2009-10-14 15:12:42,664 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)
>    at java.lang.reflect.Constructor.newInstance(Unknown Source)
>    at
>
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>    at
>
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>    at
>
> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571)
>    at java.lang.Thread.run(Unknown Source)
> ------------------------------
> 2009-10-14 15:12:43,566 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x1244f619478000d to sun.nio.ch.SelectionKeyImpl@15e32c4
> <mailto:sun.nio.ch.SelectionKeyImpl@15e32c4>
> java.io.IOException: Session Expired
>    at
>
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
>    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
>    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
> 2009-10-14 15:12:43,567 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: Expired, type: None, path: null
> 2009-10-14 15:12:43,568 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
> expired
> ------------------------------
> 2009-10-14 15:12:51,563 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> java.io.IOException: Cannot append; log is closed
>    at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584)
>    at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1445)
>    at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1244)
>    at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831)
>    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>    at java.lang.reflect.Method.invoke(Unknown Source)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> ------------------------------
> 2009-10-14 15:13:51,852 INFO org.apache.hadoop.fs.FileSystem: Could not
> cancel cleanup thread, though no FileSystems are open
> 2009-10-14 15:13:51,864 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down
> HRegionServer: file system not available
> java.io.IOException: File system is not available
>    at
>
> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:901)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:849)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:832)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1839)
>    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>    at java.lang.reflect.Method.invoke(Unknown Source)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> Caused by: java.io.IOException: Call to hfs-029010/10.1.29.10:8020
> failed on local exception: java.nio.channels.ClosedByInterruptException
>    at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
>    at org.apache.hadoop.ipc.Client.call(Client.java:742)
>    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>    at $Proxy1.getFileInfo(Unknown Source)
>    at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>    at java.lang.reflect.Method.invoke(Unknown Source)
>    at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>    at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>    at $Proxy1.getFileInfo(Unknown Source)
>    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
>    at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
>    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
>    at
>
> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114)
>    ... 9 more
> Caused by: java.nio.channels.ClosedByInterruptException
>    at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown
> Source)
>    at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>    at
>
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
>    at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>    at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>    at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>    at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
>    at java.io.BufferedOutputStream.flush(Unknown Source)
>    at java.io.DataOutputStream.flush(Unknown Source)
>    at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:480)
>    at org.apache.hadoop.ipc.Client.call(Client.java:720)
> ------------------------------
>
> PREVIOUS EXCEPTIONS FROM OTHER TESTS INCLUDE
>
> 2009-10-13 00:00:00,508 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row
> out of range for ...
>    ....
>    at
> org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1527)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1559)
>    at
> org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1627)
>    at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2277)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1768)
>    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>    at java.lang.reflect.Method.invoke(Unknown Source)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> ------------------------------
> 2009-10-12 10:06:57,809 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 7 on 60020, call put([B@1f5b92a,
> [Lorg.apache.hadoop.hbase.client.Put;@14f8e05) from 10.1.29.10:36017:
> error: java.io.IOException: Server not running, aborting
> java.io.IOException: Server not running, aborting
>  at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299)
>   at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806)
>  at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>  at java.lang.reflect.Method.invoke(Unknown Source)
>  at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650)
>  at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> 2009-10-12 10:06:57,812 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=0.0, regions=12, stores=24, storefiles=15, storefileIndexSize=8,
> memstoreSize=378, usedHeap=441, maxHeap=992, blockCacheSize=4878224,
> blockCacheFree=1035636848, blockCacheCount=0, blockCacheHitRatio=0
> 2009-10-12 10:06:57,813 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 3 on 60020, call put([B@1f52982,
> [Lorg.apache.hadoop.hbase.client.Put;@78283f) from 10.1.29.10:36017:
> error: java.io.IOException: Server not running, aborting
> java.io.IOException: Server not running, aborting
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299)
>   at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806)
>  at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>  at java.lang.reflect.Method.invoke(Unknown Source)
>  at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650)
>  at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> ------------------------------
>
>
>
>
>
> .
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message