hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Region server crashes when using replication
Date Tue, 22 Mar 2011 19:01:20 GMT


On Tue, Mar 22, 2011 at 11:51 AM, Eran Kutner <eran@gigya.com> wrote:
> Thanks, J-D.
> As for the first issue, why does this behavior make sense? What happens when
> the connection between the two cluster fails? Will the region servers of the
> primary fail as well? or at least won't be able to start? Seems very
> radical.

The DNS entry should remain, so you won't get UnknownHostException but
ConnectionRefused instead. But that's a different issue: HBASE-3130

> Regarding the second issue, I didn't see anything else in the logs, it just
> seemed like it decided to shutdown, but maybe I missed it. I will try to
> reproduce that and let you know if I succeed.

That'd be nice :)

> Regarding the timeout to detect a failed server, 3 minutes sounds like a
> very long time for a region server to be down. Obviously, during that time
> the data owned by that server is inaccessible. Is there a reason for this
> long timeout? Can it be configured?

We set it that high for people that try to push too much data to
clusters that are too small / badly configured and then end up with
crazy garbage collections. Have fun reading this serie of blog posts:

Please also see the book about this configuration:

View raw message