hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: HBase cluster design
Date Tue, 13 May 2014 10:14:13 GMT
So just to summarize the result of this discussion..
do you confirm me that the last version of HBase should (in theory) support
mapreduce jobs on tables that in the meantime could be updated by external
processes (i.e. not by the mapred job)?
One of the answer about this was saying: "Poorly tuned HBase clusters can
fail easily under heavy load"..
Could you suggest me some tuning to avoid the crashing of HBase in such
situations?

Best,
Flavio


On Fri, Apr 11, 2014 at 12:06 PM, Flavio Pompermaier
<pompermaier@okkam.it>wrote:

> Today I was able to catch an error during a mapreduce job that actually
> mimes the rowCount more or less.
> The error I saw is:
>
> ould not sync. Requesting close of hlog
> java.io.IOException: Reflection
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
> 	... 4 more
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300
File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does not have any open files.
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
>
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1160)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> 	at $Proxy14.addBlock(Unknown Source)
> 	at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> 	at $Proxy14.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
>
>
> What can be the cause of this error?
>
>
> On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel <michael_segel@hotmail.com>wrote:
>
>> You have one other thing to consider.
>>
>> Did you oversubscribe on the m/r tuning side of things.
>>
>> Many people want to segment their HBase to a portion of the cluster.
>> This should be the exception to the design not the primary cluster design.
>>
>> If you over subscribe your cluster, you will run out of memory, then you
>> need to swap, and boom bad things happen.
>>
>> Also, while many suggest not reserving room for swap... I suggest that
>> you do leave some room.
>>
>> While this doesn't address the issues in your question directly, they are
>> something that you need to consider.
>>
>> More to your point...
>> Poorly tuned HBase clusters can fail easily under heavy load.
>>
>> While Ted doesn't address this... consideration, it can become an issue.
>>
>> YMMV of course.
>>
>>
>>
>> On Apr 4, 2014, at 9:43 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> > The 'Connection refused' message was logged at WARN level.
>> >
>> > If you can pastebin more of the region server log before its crash, I
>> would
>> > be take a deeper look.
>> >
>> > BTW I assume your zookeeper quorum was healthy during that period of
>> time.
>> >
>> >
>> > On Fri, Apr 4, 2014 at 7:29 AM, Flavio Pompermaier <
>> pompermaier@okkam.it>wrote:
>> >
>> >> Yes I know I should update HBase, this is something I'm going to do
>> really
>> >> soon. Bad me..
>> >> I just wanted to know if the fact of adding/updating rows in HBase
>> while
>> >> running a mapred job could be problematic or not..
>> >> From what you told me it's not, so the problem could be caused by the
>> old
>> >> version of HBase or some other os configuration.
>> >> The update was performed via an application accessing HBase directly,
>> >> adding and updating rows of the table.
>> >> Once in a while some region servers goes down and marked as "bad
>> state" by
>> >> Cloudera so I have to restart them.
>> >>
>> >> The error I usually see is:
>> >>
>> >> 2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session
>> >> 0x13b2cf447fd0000 for server null, unexpected error, closing socket
>> >> connection and attempting reconnect
>> >> java.net.ConnectException: Connection refused
>> >>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> >>        at
>> >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>> >>        at
>> >>
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
>> >>        at
>> >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047
>> >>
>> >> Best,
>> >> Flavio
>> >>
>> >> On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> >>
>> >>> Was the updating performed by one of the mapreduce jobs ?
>> >>> HBase should be able to serve multiple mapreduce jobs in the same
>> >> cluster.
>> >>>
>> >>> Can you provide more detail on the crash ?
>> >>>
>> >>> BTW, there are 3 major releases after 0.92
>> >>> Please consider upgrading your cluster to newer release.
>> >>>
>> >>> Cheers
>> >>>
>> >>> On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier <pompermaier@okkam.it>
>> >>> wrote:
>> >>>
>> >>>> Hi to everybody,
>> >>>>
>> >>>> I have a probably stupid question: is it a problem to run many
>> >> mapreduce
>> >>>> jobs on the same HBase table at the same time? And multiple jobs
on
>> >>>> different tables on the same cluster?
>> >>>> Should I use Hoya to have a better cluster usage..?
>> >>>>
>> >>>> In my current cluster I noticed that the region servers tend to
go
>> down
>> >>> if
>> >>>> I run a mapreduce job while updating (maybe it could be related
to
>> the
>> >>> old
>> >>>> version of HBase I'm currently running: 0.92.1-cdh4.1.2).
>> >>>>
>> >>>> Best,
>> >>>> Flavio
>> >>>
>> >>
>>
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>>
>>
>>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message