hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: HBase cluster design
Date Sat, 05 Apr 2014 12:25:58 GMT
You have one other thing to consider. 

Did you oversubscribe on the m/r tuning side of things. 

Many people want to segment their HBase to a portion of the cluster. 
This should be the exception to the design not the primary cluster design. 

If you over subscribe your cluster, you will run out of memory, then you need to swap, and
boom bad things happen. 

Also, while many suggest not reserving room for swap... I suggest that you do leave some room.

While this doesn't address the issues in your question directly, they are something that you
need to consider. 

More to your point... 
Poorly tuned HBase clusters can fail easily under heavy load. 

While Ted doesn't address this... consideration, it can become an issue. 

YMMV of course. 

On Apr 4, 2014, at 9:43 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> The 'Connection refused' message was logged at WARN level.
> If you can pastebin more of the region server log before its crash, I would
> be take a deeper look.
> BTW I assume your zookeeper quorum was healthy during that period of time.
> On Fri, Apr 4, 2014 at 7:29 AM, Flavio Pompermaier <pompermaier@okkam.it>wrote:
>> Yes I know I should update HBase, this is something I'm going to do really
>> soon. Bad me..
>> I just wanted to know if the fact of adding/updating rows in HBase while
>> running a mapred job could be problematic or not..
>> From what you told me it's not, so the problem could be caused by the old
>> version of HBase or some other os configuration.
>> The update was performed via an application accessing HBase directly,
>> adding and updating rows of the table.
>> Once in a while some region servers goes down and marked as "bad state" by
>> Cloudera so I have to restart them.
>> The error I usually see is:
>> 2012-11-23 12:41:00,468 WARN org.apache.zookeeper.ClientCnxn: Session
>> 0x13b2cf447fd0000 for server null, unexpected error, closing socket
>> connection and attempting reconnect
>> java.net.ConnectException: Connection refused
>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>        at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>        at
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
>>        at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047
>> Best,
>> Flavio
>> On Fri, Apr 4, 2014 at 2:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> Was the updating performed by one of the mapreduce jobs ?
>>> HBase should be able to serve multiple mapreduce jobs in the same
>> cluster.
>>> Can you provide more detail on the crash ?
>>> BTW, there are 3 major releases after 0.92
>>> Please consider upgrading your cluster to newer release.
>>> Cheers
>>> On Apr 4, 2014, at 3:08 AM, Flavio Pompermaier <pompermaier@okkam.it>
>>> wrote:
>>>> Hi to everybody,
>>>> I have a probably stupid question: is it a problem to run many
>> mapreduce
>>>> jobs on the same HBase table at the same time? And multiple jobs on
>>>> different tables on the same cluster?
>>>> Should I use Hoya to have a better cluster usage..?
>>>> In my current cluster I noticed that the region servers tend to go down
>>> if
>>>> I run a mapreduce job while updating (maybe it could be related to the
>>> old
>>>> version of HBase I'm currently running: 0.92.1-cdh4.1.2).
>>>> Best,
>>>> Flavio

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

View raw message