hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vimal Jain <vkj...@gmail.com>
Subject Re: Exceptions in log files of Hadoop and Hbase
Date Tue, 13 Aug 2013 15:47:46 GMT
Yes. that is the reason i am planning to make it 8GB.
I am running in pseudo distributed mode as i will expand to 2-3 node
cluster as my data size increases in future.
Also , i have disabled swapping ( have set vm.swappiness to 0).


On Tue, Aug 13, 2013 at 9:12 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Vimal,
>
> 4GB for all the process is very short... You might want to run in
> standalone mode instead of pseudo-distributed. That will save you some
> memory. Have you checked if you server is swapping? That will make is slow,
> and then you will miss some heartbits and processes will close...
>
> JM
>
> 2013/8/13 Vimal Jain <vkjk89@gmail.com>
>
> > Hi Jean-Marc,
> > Thanks for your reply.
> > I have one node cluster(pseudo distributed mode) , so 192.168.20.30 is
> the
> > only server which hosts all 6 processes (
> > namenode,datanode,secondarynamenode,Hmaster,HRegion and Zookeeper).
> >
> > At time of this problem, i had given following memory to these processes
> (
> > I had total 4 GB memory on this box):
> >
> > NameNode,DataNode,SecondaryNamenode - 1 GB ( default)
> > HMaster,Zookeeper - 512MB
> > HRegion-1.5 GB
> >
> > So total memory allocation is 5.5 GB and it exceeds physical memory.( I
> am
> > planning to increase RAM on my system to 8GB and change memory allocation
> > as 2 GB to HRegion and 1 GB to other 5 ).
> > So could that be a problem ?
> > I am talking about memory stuff as i see following in Hmaster log.
> >
> > *org.apache.hadoop.hbase.util.Sleeper: We slept 1266874891120ms instead
> of
> > 10000ms, this is likely due to a long garbage collecting pause and it's
> > usually bad*
> > *
> > *
> >
> >
> >
> > On Tue, Aug 13, 2013 at 6:42 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Vimal,
> > >
> > > What was your cluster doing at that time? Was it very busy? Looks like
> > one
> > > server (192.168.20.30 <http://192.168.20.30:50010>) went so busy that
> it
> > > failed to report active and closed.
> > >
> > > JM
> > >
> > > 2013/8/13 Vimal Jain <vkjk89@gmail.com>
> > >
> > > > Hi,
> > > > I am using Hadoop and Hbase in pseudo distributed mode.
> > > > I am using Hadoop version - 1.1.2  and Hbase version - 0.94.7
> > > >
> > > > Recently i found some exception in hadoop and hbase logs.
> > > > I am not sure what has caused this.
> > > > Requesting you to please help here.
> > > >
> > > >
> > > > *Exception in Master log :*
> > > > 2013-07-31 15:51:04,694 WARN org.apache.hadoop.hbase.util.
> > > > Sleeper: We slept 1266874891120ms instead of 10000ms, this is likely
> > due
> > > to
> > > > a long garbage c
> > > > ollecting pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > > 2013-07-31 15:51:04,798 WARN
> > > org.apache.hadoop.hbase.master.CatalogJanitor:
> > > > Failed scan of catalog table
> > > > org.apache.hadoop.hbase.client.ScannerTimeoutException: 82253ms
> passed
> > > > since the last invocation, timeout is currently set to 60000
> > > >     at
> > > >
> > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:283)
> > > >     at
> > > >
> > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:727)
> > > >     at
> > > >
> > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:184)
> > > >     at
> > > >
> > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:169)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:123)
> > > >     at
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:134)
> > > >     at
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:92)
> > > >     at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
> > > >     at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: org.apache.hadoop.hbase.UnknownScannerException:
> > > > org.apache.hadoop.hbase.UnknownScannerException: Name:
> > > -8839286818925700393
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2544)
> > > >     at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> > > >
> > > >     at
> > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > >     at java.lang.reflect.Method.invoke(Method.java:597)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > >     at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > > >
> > > >     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > > > Method)
> > > >     at
> > > >
> > > >
> > >
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> > > >     at
> > > >
> > > >
> > >
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> > > >     at
> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:143)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:42)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:164)
> > > >     at
> > > >
> > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:274)
> > > >     ... 8 more
> > > > 2013-07-31 15:54:42,526 DEBUG
> > > org.apache.hadoop.hbase.client.ClientScanner:
> > > > Creating scanner over .META. starting at key ''
> > > > 2013-07-31 15:54:42,526 DEBUG
> > > org.apache.hadoop.hbase.client.ClientScanner:
> > > > Advancing internal scanner to startKey at ''
> > > > 2013-07-31 15:54:42,531 DEBUG
> > > org.apache.hadoop.hbase.client.ClientScanner:
> > > > Finished with scanning at {NAME => '.META.,,1', STARTKEY => '',
> ENDKEY
> > =>
> > > > '', ENCODED => 1028785192,}
> > > > 2013-07-31 15:54:42,532 DEBUG
> > > > org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 5 catalog
> row(s)
> > > and
> > > > gc'd 0 unreferenced parent region(s)
> > > > 2013-07-31 15:54:42,751 INFO
> > org.apache.hadoop.hbase.master.LoadBalancer:
> > > > Skipping load balancing because balanced cluster; servers=1 regions=5
> > > > averag
> > > > e=5.0 mostloaded=5 leastloaded=5
> > > >
> > > > 2013-07-31 16:43:23,358 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept
> > > > 25771ms instead of 1000ms, this is likely due to a long garbage
> > > collecting
> > > >  pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > > 2013-07-31 16:43:23,358 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept
> > > > 30091ms instead of 1000ms, this is likely due to a long garbage
> > > collecting
> > > >  pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > > 2013-07-31 16:43:23,361 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept
> > > > 28613ms instead of 10000ms, this is likely due to a long garbage
> > > collectin
> > > > g pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > > 2013-07-31 16:43:23,361 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept
> > > > 27457ms instead of 10000ms, this is likely due to a long garbage
> > > collectin
> > > > g pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > > 2013-07-31 16:43:23,362 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept
> > > > 34587ms instead of 10000ms, this is likely due to a long garbage
> > > collectin
> > > > g pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > > 2013-07-31 16:43:23,367 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept
> > > > 78600ms instead of 60000ms, this is likely due to a long garbage
> > > collectin
> > > > g pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > > 2013-07-31 16:43:23,369 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept
> > > > 78562ms instead of 60000ms, this is likely due to a long garbage
> > > collectin
> > > > g pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > >
> > > >
> > > > *Exception in Region log :
> > > > *
> > > > java.io.IOException: Reflection
> > > >     at org.apache.hadoop.hbase.
> > > >
> > >
> >
> regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:304)
> > > >     at
> > > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1375)
> > > >     at
> > > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1319)
> > > >     at
> > org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1480)
> > > >     at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1271)
> > > >     at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.lang.reflect.InvocationTargetException
> > > >     at sun.reflect.GeneratedMethodAccessor156.invoke(Unknown Source)
> > > >
> > > >     at
> > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > >     at java.lang.reflect.Method.invoke(Method.java:597)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:302)
> > > >     ... 5 more
> > > > Caused by: java.io.IOException: DFSOutputStream is closed
> > > >     at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3669)
> > > >     at
> > > >
> > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> > > >     at
> > > >
> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:995)
> > > >     ... 9 more
> > > > 2013-07-31 15:50:37,761 FATAL
> > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync.
> > Requesting
> > > > close of hlog
> > > >
> > > > 2013-07-31 19:23:38,289 INFO org.apache.hadoop.hdfs.DFSClient: Could
> > not
> > > > complete file /hbase/.logs/hbase.xyz.com,60020,1370497170634/
> > > hbase.xyz.com
> > > > %2C60020%2C1370497170634.1375265949987
> > > > retrying...
> > > > 2013-07-31 19:23:38,289 INFO org.apache.hadoop.hdfs.DFSClient: Could
> > not
> > > > complete file /hbase/.logs/hbase.xyz.com,60020,1370497170634/
> > > hbase.xyz.com
> > > > %2C60020%2C1370497170634.1375265949987
> > > > retrying...
> > > >
> > > >
> > > > *Exception in data node :-*
> > > >
> > > > 2013-07-31 19:39:51,094 WARN org.apache.hadoop.hdfs.server.
> > > > datanode.DataNode: org.apache.hadoop.ipc.RemoteException:
> > > > java.io.IOException: Got blockRec
> > > > eived message from unregistered or dead node
> > > blk_-4787262105551508952_28369
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:4188)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:1069)
> > > >     at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
> > > >     at
> > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > >     at java.lang.reflect.Method.invoke(Method.java:597)
> > > >     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
> > > >     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
> > > >     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
> > > >     at java.security.AccessController.doPrivileged(Native Method)
> > > >     at javax.security.auth.Subject.doAs(Subject.java:396)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> > > >     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> > > >
> > > >     at org.apache.hadoop.ipc.Client.call(Client.java:1107)
> > > >     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> > > >     at sun.proxy.$Proxy5.blockReceived(Unknown Source)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1006)
> > > >     at
> > > >
> org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1527)
> > > >     at java.lang.Thread.run(Thread.java:662)
> > > >
> > > > *Exception in name node :- *
> > > >
> > > > 2013-07-31 19:39:50,671 WARN org.apache.hadoop.hdfs.
> > > > StateChange: BLOCK* NameSystem.blockReceived:
> > > > blk_-4787262105551508952_28369 is received from dead
> > > >  or unregistered node 192.168.20.30:50010
> > > > 2013-07-31 19:39:50,671 ERROR
> > > > org.apache.hadoop.security.UserGroupInformation:
> > > PriviledgedActionException
> > > > as:hadoop cause:java.io.IOException: Got blo
> > > > ckReceived message from unregistered or dead node
> > > > blk_-4787262105551508952_28369
> > > > 2013-07-31 19:39:50,671 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 5 on 9000, call blockReceived(DatanodeRegistration(
> > > > 192.168.20.30:50010,
> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > > infoPort=50075,
> > > > ipcPort=50020), [Lorg.apache.hadoop.hdfs.protocol.Block;@64f2d559,
> > > > [Ljava.l
> > > > ang.String;@294f9d6) from 192.168.20.30:59764: error:
> > > java.io.IOException:
> > > > Got blockReceived message from unregistered or dead node
> > > blk_-4787262105551
> > > > 508952_28369
> > > > java.io.IOException: Got blockReceived message from unregistered or
> > dead
> > > > node blk_-4787262105551508952_28369
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:4188)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:1069)
> > > >     at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
> > > >     at
> > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > >     at java.lang.reflect.Method.invoke(Method.java:597)
> > > >     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
> > > >     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
> > > >     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
> > > >     at java.security.AccessController.doPrivileged(Native Method)
> > > >     at javax.security.auth.Subject.doAs(Subject.java:396)
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> > > >     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > > >
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message