hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: hbase master retries to RS/DN
Date Thu, 19 May 2011 21:22:04 GMT
The config and the retries you pasted are unrelated.

The former controls the number of retries when regions are moving and
the client must query .META. or -ROOT-

The latter is the Hadoop RPC client timeout and looking at the code
the config is ipc.client.connect.max.retries from
https://github.com/apache/hadoop/blob/branch-0.20/src/core/org/apache/hadoop/ipc/Client.java#L631

J-D

On Thu, May 19, 2011 at 11:46 AM, Jack Levin <magnito@gmail.com> wrote:
> Hello, we have a situation when when RS/DN crashes hard, master is
> very slow to recover, we notice that it waits on these log lines:
> 2011-05-19 11:20:57,766 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 0 time(s).
> 2011-05-19 11:20:58,767 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 1 time(s).
> 2011-05-19 11:20:59,768 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 2 time(s).
> 2011-05-19 11:21:00,768 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 3 time(s).
> 2011-05-19 11:21:01,769 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 4 time(s).
> 2011-05-19 11:21:02,769 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 5 time(s).
> 2011-05-19 11:21:03,770 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 6 time(s).
> 2011-05-19 11:21:04,771 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 7 time(s).
> 2011-05-19 11:21:05,771 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 8 time(s).
> 2011-05-19 11:21:06,772 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: /10.103.7.22:50020. Already tried 9 time(s).
>
> This set repeats multiple times for log splits.   So I look around,
> and set this config to be:
>
>  <property>
>    <name>hbase.client.retries.number</name>
>    <value>2</value>
>    <description>Maximum retries.  Used as maximum for all retryable
>    operations such as fetching of the root region from root region
>    server, getting a cell's value, starting a row update, etc.
>    Default: 10.
>    </description>
>  </property>
>
> Unfortunately, next time server died, it made no difference.  Is this
> a known issue for 0.89?  If so, was it resolved in 0.90.2?
>
> -Jack
>

Mime
View raw message