hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Region server crashes when using replication
Date Tue, 22 Mar 2011 19:01:20 GMT
Inline.

J-D

On Tue, Mar 22, 2011 at 11:51 AM, Eran Kutner <eran@gigya.com> wrote:
> Thanks, J-D.
> As for the first issue, why does this behavior make sense? What happens when
> the connection between the two cluster fails? Will the region servers of the
> primary fail as well? or at least won't be able to start? Seems very
> radical.

The DNS entry should remain, so you won't get UnknownHostException but
ConnectionRefused instead. But that's a different issue: HBASE-3130

>
> Regarding the second issue, I didn't see anything else in the logs, it just
> seemed like it decided to shutdown, but maybe I missed it. I will try to
> reproduce that and let you know if I succeed.

That'd be nice :)

>
> Regarding the timeout to detect a failed server, 3 minutes sounds like a
> very long time for a region server to be down. Obviously, during that time
> the data owned by that server is inaccessible. Is there a reason for this
> long timeout? Can it be configured?
>

We set it that high for people that try to push too much data to
clusters that are too small / badly configured and then end up with
crazy garbage collections. Have fun reading this serie of blog posts:
http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/

Please also see the book about this configuration:
http://hbase.apache.org/book.html#recommended_configurations

Mime
View raw message