Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 42938 invoked from network); 20 Oct 2009 22:12:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Oct 2009 22:12:58 -0000 Received: (qmail 28274 invoked by uid 500); 20 Oct 2009 22:12:57 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 28234 invoked by uid 500); 20 Oct 2009 22:12:57 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 28224 invoked by uid 99); 20 Oct 2009 22:12:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2009 22:12:57 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of elsif.then@gmail.com designates 72.14.220.156 as permitted sender) Received: from [72.14.220.156] (HELO fg-out-1718.google.com) (72.14.220.156) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2009 22:12:54 +0000 Received: by fg-out-1718.google.com with SMTP id d23so1991227fga.11 for ; Tue, 20 Oct 2009 15:12:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:content-type; bh=UtTQvp87HNAzzFBnwoSST0Z6tSlyLgvPggE/e43SK+0=; b=I2qVrOKXN3r+gsloPLld8uR0y6lYSNECa+2r5pVY79XpjWaNDnbkR0aOVNDUcaJyuU ReEo2T+zWrJ7nkXN9BKlibMHs/uHP3HN5lVdV6KpJFdZDyYmwRsJ/ABn2x88puzYDFku av4JAbt8XBRxH1/QsFz4icjxF6aHP6BJZ+1Sc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type; b=E9//IFhXC86MeQT4d8wLFhiG6O9BXKVNnE4KFjdMHWDMB1o6uJobFt6tXcsxN4mNLI jWY34nt2J3maZ/VlOq+qkU0JGR4Qk1r+OdxuaSlLleAaRRtuCrdQpAJCV79b3/eO4BN0 pErd8/vVnI+dVmg1xpTnTh/tOn0FVLjf0AwTE= Received: by 10.103.80.25 with SMTP id h25mr2963352mul.15.1256076752739; Tue, 20 Oct 2009 15:12:32 -0700 (PDT) Received: from ?127.0.0.1? (EDDD.MIT.EDU [18.246.2.79]) by mx.google.com with ESMTPS id 23sm492135mun.33.2009.10.20.15.12.20 (version=SSLv3 cipher=RC4-MD5); Tue, 20 Oct 2009 15:12:31 -0700 (PDT) Message-ID: <4ADE35DC.1040003@gmail.com> Date: Tue, 20 Oct 2009 15:12:44 -0700 From: elsif User-Agent: Thunderbird 2.0.0.23 (X11/20090812) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: HBase Exceptions on version 0.20.1 Content-Type: multipart/mixed; boundary="------------060307060406010404020007" --------------060307060406010404020007 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello Everyone, We are experiencing many HBase exceptions while running the attached test program using HBase 0.20.1 and Hadoop 0.20.1 (r810220) releases on a Hadoop cluster with 14 data nodes. The HBase operational performance degrades quickly once certain error conditions are encountered especially if there are issues with the META region. Has anyone else seen these exceptions listed below? We ran the test with 3 region servers and then with 14 region servers. In both cases, we still got exceptions. In each test run, we shutdown HBase, removed the main /hbase folder, restarted HBase and then ran the test program. The test program uses a table named "fc_test". The program is started by the following command: /opt/hbase/bin/hbase org.apache.hadoop.hbase.util.TableTest 30 4000000 The attached HBase test program starts 30 threads. Each thread loops, inserting randomly generated keys along with a fixed data pattern of 4,000,000 bytes. After running the test for 2-12 hours, the table scanning of "fc_test" is broken and only a limited number of put operations are still succeeding. EXCEPTION LIST ------------------------------ 2009-10-15 03:28:39,005 DEBUG org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered on store data; time since last major compaction 89677908ms 2009-10-15 03:28:39,012 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into /hbase/fc_test/compaction.dir/1073627844, seqid=396374 2009-10-15 03:28:39,245 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-6054029382496843671_324379 from any node: java.io.IOException: No live nodes contain current block ------------------------------ 2009-10-15 07:41:43,426 DEBUG org.apache.zookeeper.ClientCnxn: Got ping response for sessionid:0x2244f6194a5000d after 1ms 2009-10-15 07:41:53,993 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x2244f6194a5000d to sun.nio.ch.SelectionKeyImpl@139f5ca java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2009-10-15 07:41:54,344 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null 2009-10-15 07:41:55,084 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server hfs-029014/10.1.29.14:2181 2009-10-15 07:41:55,085 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.1.29.19:39421 remote=hfs-029014/10.1.29.14:2181] ------------------------------ 2009-10-15 08:37:40,982 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) ------------------------------ 2009-10-15 07:57:28,025 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=4.652237MB (4878224), Free=987.6603MB (1035636848), Max=992.3125MB (1040515072), Counts: Blocks=0, Access=51003, Hit=0, Miss=51003, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, Evicted/Run=NaN 2009-10-15 07:57:28,189 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2009-10-15 07:57:28,190 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7166000434569842394_332632 2009-10-15 07:57:28,306 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 5 on 60020 took 2297ms appending an edit to hlog; editcount=146 ----------------------------- 2009-10-15 09:10:00,559 INFO org.apache.zookeeper.ClientCnxn: Exception while closing send thread for session 0x1244f6194780008 : Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for session: 0x1244f6194780008 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down ------------------------------ 2009-10-13 12:30:58,207 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read master address from ZooKeeper. Retrying. Error was: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240) at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1338) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1370) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:427) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327) ------------------------------ 2009-10-14 15:12:42,664 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 org.apache.hadoop.hbase.Leases$LeaseStillHeldException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) at java.lang.Thread.run(Unknown Source) ------------------------------ 2009-10-14 15:12:43,566 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x1244f619478000d to sun.nio.ch.SelectionKeyImpl@15e32c4 java.io.IOException: Session Expired at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) 2009-10-14 15:12:43,567 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Expired, type: None, path: null 2009-10-14 15:12:43,568 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired ------------------------------ 2009-10-14 15:12:51,563 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Cannot append; log is closed at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1445) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1244) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) ------------------------------ 2009-10-14 15:13:51,852 INFO org.apache.hadoop.fs.FileSystem: Could not cancel cleanup thread, though no FileSystems are open 2009-10-14 15:13:51,864 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down HRegionServer: file system not available java.io.IOException: File system is not available at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125) at org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:901) at org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:849) at org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:832) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1839) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) Caused by: java.io.IOException: Call to hfs-029010/10.1.29.10:8020 failed on local exception: java.nio.channels.ClosedByInterruptException at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) at org.apache.hadoop.ipc.Client.call(Client.java:742) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114) ... 9 more Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.flushBuffer(Unknown Source) at java.io.BufferedOutputStream.flush(Unknown Source) at java.io.DataOutputStream.flush(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:480) at org.apache.hadoop.ipc.Client.call(Client.java:720) ------------------------------ PREVIOUS EXCEPTIONS FROM OTHER TESTS INCLUDE 2009-10-13 00:00:00,508 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for ... .... at org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1527) at org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1559) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1627) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2277) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1768) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) ------------------------------ 2009-10-12 10:06:57,809 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020, call put([B@1f5b92a, [Lorg.apache.hadoop.hbase.client.Put;@14f8e05) from 10.1.29.10:36017: error: java.io.IOException: Server not running, aborting java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2009-10-12 10:06:57,812 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=12, stores=24, storefiles=15, storefileIndexSize=8, memstoreSize=378, usedHeap=441, maxHeap=992, blockCacheSize=4878224, blockCacheFree=1035636848, blockCacheCount=0, blockCacheHitRatio=0 2009-10-12 10:06:57,813 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, call put([B@1f52982, [Lorg.apache.hadoop.hbase.client.Put;@78283f) from 10.1.29.10:36017: error: java.io.IOException: Server not running, aborting java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) ------------------------------ . --------------060307060406010404020007--