hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Nazário dos Santos <nazario.lu...@gmail.com>
Subject Re: HBase cracks on EC2
Date Wed, 16 Sep 2009 17:48:53 GMT
Thanks, Andy.

It's only a test with one large instance (7.5Gb of memory and 4 EC2 compute
units). I've already tried a cluster with 3 small instances but the same
error occurred (I know small instances are not recommended and that is why
I'm trying large instances now). Can it be the number of instances I'm
using?

Hadoop logs are clear.

The line bellow caught my attention. I'm wondering if the problem can be
related to the issue http://issues.apache.org/jira/browse/HBASE-1777.

2009-09-16 09:02:59,846 FATAL
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
Replay of hlog required. Forcing server shutdown
java.io.IOException: Server not running, aborting
        at org.apache.hadoop.hbase.
regionserver.HRegionServer.checkOpen(HRegionServer.java:2299)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.exists(HRegionServer.java:1775)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2009-09-16 09:02:59,863 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/10.209.207.80:60020.compactor exiting

Lucas




On Wed, Sep 16, 2009 at 1:37 PM, Andrew Purtell <apurtell@apache.org> wrote:

> Lucas,
>
> > [...] ServerManager: 0 region servers, 1 dead, average load
>
> Is this a pseudo distributed test? I.e. everything running on one EC2
> instance?
>
> What instance type are you using?
>
> The reported exceptions indicate a possible problem with DFS. Are there
> exceptions in the DataNode logs? Or the NameNode log?
>
> Best regards,
>
>    - Andy
>
>
>
>
> ________________________________
> From: Lucas Nazário dos Santos <nazario.lucas@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wednesday, September 16, 2009 6:21:27 AM
> Subject: HBase cracks on EC2
>
> Hi,
>
> I'm trying to migrate my solution to Amazon EC2 but the following errors
> crack HBase. I'm using HBase and Hadoop 0.20.0. Does anybody know what may
> be the cause of it?
>
> Lucas
>
>
>
> HBASE MASTER
> 2009-09-16 09:02:41,271 INFO org.apache.hadoop.hbase.master.ServerManager:
> 1
> region servers, 0 dead, average load 6.0
> 2009-09-16 09:02:41,453 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scanning meta region {server:
> 10.209.207.80:60020,
> regionname: -ROOT-,,0, startKey: <>}
> 2009-09-16 09:02:41,472 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> 10.209.207.80:60020, regionname: -ROOT-,,0, startKey: <>} complete
> 2009-09-16 09:02:41,553 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {server:
> 10.209.207.80:60020,
> regionname: .META.,,1, startKey: <>}
> 2009-09-16 09:02:41,604 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 4 row(s) of meta region {server:
> 10.209.207.80:60020, regionname: .META.,,1, startKey: <>} complete
> 2009-09-16 09:02:41,605 INFO org.apache.hadoop.hbase.master.BaseScanner:
> All
> 1 .META. region(s) scanned
> 2009-09-16 09:03:01,905 INFO org.apache.hadoop.hbase.master.ServerManager:
> domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743 znode expired
> 2009-09-16 09:03:01,921 INFO org.apache.hadoop.hbase.master.RegionManager:
> -ROOT- region unset (but not set to be reassigned)
> 2009-09-16 09:03:01,921 INFO org.apache.hadoop.hbase.master.RegionManager:
> META region removed from onlineMetaRegions
> 2009-09-16 09:03:01,923 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743:
> logSplit: false, rootRescanned: false, numberOfMetaRegions: 1,
> onlineMetaRegions.size(): 0
> 2009-09-16 09:03:01,935 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 2 hlog(s) in
>
> hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743
> 2009-09-16 09:03:01,943 WARN org.apache.hadoop.hbase.regionserver.HLog:
> Exception processing
>
> hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743/hlog.dat.1253103102168
> -- continuing. Possible DATA LOSS!
> java.io.IOException: wrong key class:
> org.apache.hadoop.hbase.regionserver.HLogKey is not class
> org.apache.hadoop.hbase.regionserver.transactional.THLogKey
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876)
>        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:880)
>        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:802)
>        at
>
> org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:274)
>        at
> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:492)
>        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426)
> 2009-09-16 09:03:01,951 WARN org.apache.hadoop.hbase.regionserver.HLog:
> Exception processing
>
> hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743/hlog.dat.1253105692784
> -- continuing. Possible DATA LOSS!
> java.io.IOException: wrong key class:
> org.apache.hadoop.hbase.regionserver.HLogKey is not class
> org.apache.hadoop.hbase.regionserver.transactional.THLogKey
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1824)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1876)
>        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:880)
>        at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:802)
>        at
>
> org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:274)
>        at
> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:492)
>        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:426)
> 2009-09-16 09:03:01,956 INFO org.apache.hadoop.hbase.regionserver.HLog:
> hlog
> file splitting completed in 29 millis for
>
> hdfs://domU-12-31-39-07-CC-A2.compute-1.internal:9000/hbase/.logs/domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743
> 2009-09-16 09:03:01,956 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
> meta reassignment and scanning:
> 2009-09-16 09:03:01,956 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: ProcessServerShutdown
> reassigning ROOT region
> 2009-09-16 09:03:01,956 INFO org.apache.hadoop.hbase.master.RegionManager:
> -ROOT- region unset (but not set to be reassigned)
> 2009-09-16 09:03:01,956 INFO org.apache.hadoop.hbase.master.RegionManager:
> ROOT inserted into regionsInTransition
> 2009-09-16 09:03:01,956 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: ProcessServerShutdown
> setting to unassigned: {server: 10.209.207.80:60020, regionname:
> .META.,,1,
> startKey: <>}
> 2009-09-16 09:03:41,273 INFO org.apache.hadoop.hbase.master.ServerManager:
> 0
> region servers, 1 dead, average load
> NaN[domU-12-31-39-07-CC-A2.compute-1.internal,60020,1253103101743]
>
>
>
> HBASE REGIONSERVER
> 2009-09-16 09:02:56,571 WARN org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
> /hbase/news/698339833/charset/3497317581208054515 retries left 1
> 2009-09-16 09:02:59,844 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /hbase/news/698339833/charset/3497317581208054515 could only be replicated
> to 0 nodes, instead of 1
>        at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>        at java.lang.reflect.Method.invoke(Unknown Source)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Unknown Source)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:739)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy1.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>        at java.lang.reflect.Method.invoke(Unknown Source)
>        at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy1.addBlock(Unknown Source)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2875)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2757)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2048)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2234)
>
> 2009-09-16 09:02:59,844 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2009-09-16 09:02:59,844 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get
> block locations. Source file
> "/hbase/news/698339833/charset/3497317581208054515" - Aborting...
> 17581208054515" - Aborting...
> 2009-09-16 09:02:59,846 FATAL
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog
> required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region:
> news,,1253103842893
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:950)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:843)
>        at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:241)
>        at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:149)
> Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /hbase/news/698339833/charset/3497317581208054515 could only be replicated
> to 0 nodes, instead of 1
>        at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>        at java.lang.reflect.Method.invoke(Unknown Source)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Unknown Source)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:739)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy1.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>        at java.lang.reflect.Method.invoke(Unknown Source)
>        at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy1.addBlock(Unknown Source)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2875)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2757)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2048)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2234)
> 2009-09-16 09:02:59,849 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=0.0, regions=6, stores=27, storefiles=36, storefileIndexSize=0,
> memstoreSize=0, usedHeap=195, maxHeap=996, blockCacheSize=82678352,
> blockCacheFree=126368384, blockCacheCount=1168, blockCacheHitRatio=99
> 2009-09-16 09:02:59,849 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver/10.209.207.80:60020.cacheFlusher exiting
> 2009-09-16 09:02:59,849 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1086 on 60020, call exists([B@466057, row=
> http://www.bloomberg.com/news/commentary/marinis.html#, maxVersions=1,
> timeRange=[0,9223372036854775807), families=ALL) from 10.209.207.80:47866:
> error: java.io.IOException: Server not running, aborting
> java.io.IOException: Server not running, aborting
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.exists(HRegionServer.java:1775)
>        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>        at java.lang.reflect.Method.invoke(Unknown Source)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> 2009-09-16 09:02:59,863 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver/10.209.207.80:60020.compactor exiting
> 2009-09-16 09:03:00,027 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60020
> 2009-09-16 09:03:00,028 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1 on 60020: exiting
> 2009-09-16 09:03:00,039 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1092 on 60020: exiting
> 2009-09-16 09:03:00,042 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> IPC
> Server listener on 60020
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 3 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 4 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 7 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 8 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 9 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 10 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 18 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 19 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 20 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 22 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 23 on 60020: exiting
> 2009-09-16 09:03:00,043 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 24 on 60020: exiting
> 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 25 on 60020: exiting
> 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 26 on 60020: exiting
> 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 27 on 60020: exiting
> 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 29 on 60020: exiting
> 2009-09-16 09:03:00,044 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 30 on 60020: exiting
> 2009-09-16 09:03:01,898 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> 2009-09-16 09:03:01,898 INFO org.apache.zookeeper.ZooKeeper: Closing
> session: 0x123c2c3d8bb0001
> 2009-09-16 09:03:01,899 INFO org.apache.zookeeper.ClientCnxn: Closing
> ClientCnxn for session: 0x123c2c3d8bb0001
> 2009-09-16 09:03:01,905 INFO org.apache.zookeeper.ClientCnxn: Exception
> while closing send thread for session 0x123c2c3d8bb0001 : Read error rc =
> -1
> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
> 2009-09-16 09:03:02,008 INFO org.apache.zookeeper.ClientCnxn: Disconnecting
> ClientCnxn for session: 0x123c2c3d8bb0001
> 2009-09-16 09:03:02,008 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x123c2c3d8bb0001 closed
> 2009-09-16 09:03:02,008 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2009-09-16 09:03:02,110 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
> 10.209.207.80:60020 exiting
> 2009-09-16 09:03:03,603 INFO org.apache.hadoop.hbase.Leases:
> regionserver/10.209.207.80:60020.leaseChecker closing leases
> 2009-09-16 09:03:03,603 INFO org.apache.hadoop.hbase.Leases:
> regionserver/10.209.207.80:60020.leaseChecker closed leases
> 2009-09-16 09:03:03,604 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
> thread.
> 2009-09-16 09:03:03,604 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
> complete
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message