hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: HBase cluster design
Date Fri, 11 Apr 2014 10:06:46 GMT
Today I was able to catch an error during a mapreduce job that actually
mimes the rowCount more or less.
The error I saw is:

ould not sync. Requesting close of hlog
java.io.IOException: Reflection
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245)
	at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
	... 4 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300
File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does
not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

	at org.apache.hadoop.ipc.Client.call(Client.java:1160)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at $Proxy14.addBlock(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at $Proxy14.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)


What can be the cause of this error?

On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel <michael_segel@hotmail.com>wrote:

> You have one other thing to consider.
>
> Did you oversubscribe on the m/r tuning side of things.
>
> Many people want to segment their HBase to a portion of the cluster.
> This should be the exception to the design not the primary cluster design.
>
> If you over subscribe your cluster, you will run out of memory, then you
> need to swap, and boom bad things happen.
>
> Also, while many suggest not reserving room for swap... I suggest that you
> do leave some room.
>
> While this doesn't address the issues in your question directly, they are
> something that you need to consider.
>
> More to your point...
> Poorly tuned HBase clusters can fail easily under heavy load.
>
> While Ted doesn't address this... consideration, it can become an issue.
>
> YMMV of course.
>
>
>
> On Apr 4, 2014, at 9:43 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > The 'Connection refused' message was logged at WARN level.
> >
> > If you can pastebin more of the region server log before its crash, I
> would
> > be take a deeper look.
> >
> > BTW I assume your zookeeper quorum was healthy during that period of
> time.
> >
> >
> > On Fri, Apr 4, 2014 at 7:29 AM, Flavio Pompermaier <pompermaier@okkam.it
> >wrote:
> >
> >> Yes I know I should update HBase, this is something I'm going to do
> really
> >> soon. Bad me..
> >> I just wanted to know if the fact of adding/updating rows in HBase while
> >> running a mapred job could be problematic or not..
> >> From what you told me it's not, so the problem could be caused by the
> old
> >> version of HBase or some other os configuration.
> >> The update was performed via an application accessing HBase directly,
> >> adding and updating rows of the table.
> >> Once in a while some region servers goes down and marked as "bad state"
> by
> >> Cloudera so I have to restart them.
> >>
> >> The error I usually see is:
> >>
> >> 2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session
> >> 0x13b2cf447fd0000 for server null, unexpected error, closing socket
> >> connection and attempting reconnect
> >> java.net.ConnectException: Connection refused
> >>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >>        at
> >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >>        at
> >>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
> >>        at
> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047
> >>
> >> Best,
> >> Flavio
> >>
> >> On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>
> >>> Was the updating performed by one of the mapreduce jobs ?
> >>> HBase should be able to serve multiple mapreduce jobs in the same
> >> cluster.
> >>>
> >>> Can you provide more detail on the crash ?
> >>>
> >>> BTW, there are 3 major releases after 0.92
> >>> Please consider upgrading your cluster to newer release.
> >>>
> >>> Cheers
> >>>
> >>> On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier <pompermaier@okkam.it>
> >>> wrote:
> >>>
> >>>> Hi to everybody,
> >>>>
> >>>> I have a probably stupid question: is it a problem to run many
> >> mapreduce
> >>>> jobs on the same HBase table at the same time? And multiple jobs on
> >>>> different tables on the same cluster?
> >>>> Should I use Hoya to have a better cluster usage..?
> >>>>
> >>>> In my current cluster I noticed that the region servers tend to go
> down
> >>> if
> >>>> I run a mapreduce job while updating (maybe it could be related to the
> >>> old
> >>>> version of HBase I'm currently running: 0.92.1-cdh4.1.2).
> >>>>
> >>>> Best,
> >>>> Flavio
> >>>
> >>
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message