Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 63CD34CA9 for ; Thu, 19 May 2011 21:23:12 +0000 (UTC) Received: (qmail 81018 invoked by uid 500); 19 May 2011 21:23:11 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 80984 invoked by uid 500); 19 May 2011 21:23:11 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 80950 invoked by uid 99); 19 May 2011 21:23:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 May 2011 21:23:11 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.218.41 as permitted sender) Received: from [209.85.218.41] (HELO mail-yi0-f41.google.com) (209.85.218.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 May 2011 21:23:06 +0000 Received: by yib18 with SMTP id 18so1441848yib.14 for ; Thu, 19 May 2011 14:22:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Z933SVL/hwO7EajRck8tInOq914vY/ZJnAyyZ2xMy1I=; b=Tz6pYFxroFPvUOiH8UhfUlG6U9avHVoBmh+X6MXM1eQ88gGGvxa+MuMoQzgJyapDX+ Kmnaj9LqYviQh16DIVVwN1Bjoek/zMGLy3KPxyqidKTCnQSN9gNRCQ9js96i8acWSaw5 OHqp/zHAaIhsB1d69knTubXxHxAHOPjmw064c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=c3w/27j1i21+CzSusGrM5onrTnpKfNFOxkr1NM62YkZdoFRfNvGR1/nTput7wRqgQk HOWnSO8JbO+GAj1gWxvyGfS67pBxta+cbAaygPVYiayRGfZZp9XNgCNqHihIFHS2u4c2 vFLFKb57qZPoc1MyH6mayhMxEZJYoFoH7dNWA= MIME-Version: 1.0 Received: by 10.100.15.34 with SMTP id 34mr2162010ano.165.1305840131073; Thu, 19 May 2011 14:22:11 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.100.94.15 with HTTP; Thu, 19 May 2011 14:22:04 -0700 (PDT) In-Reply-To: References: Date: Thu, 19 May 2011 14:22:04 -0700 X-Google-Sender-Auth: AueVFWKJF7PslmT0VUuvQDmPA6Q Message-ID: Subject: Re: hbase master retries to RS/DN From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The config and the retries you pasted are unrelated. The former controls the number of retries when regions are moving and the client must query .META. or -ROOT- The latter is the Hadoop RPC client timeout and looking at the code the config is ipc.client.connect.max.retries from https://github.com/apache/hadoop/blob/branch-0.20/src/core/org/apache/hadoo= p/ipc/Client.java#L631 J-D On Thu, May 19, 2011 at 11:46 AM, Jack Levin wrote: > Hello, we have a situation when when RS/DN crashes hard, master is > very slow to recover, we notice that it waits on these log lines: > 2011-05-19 11:20:57,766 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 0 time(s). > 2011-05-19 11:20:58,767 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 1 time(s). > 2011-05-19 11:20:59,768 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 2 time(s). > 2011-05-19 11:21:00,768 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 3 time(s). > 2011-05-19 11:21:01,769 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 4 time(s). > 2011-05-19 11:21:02,769 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 5 time(s). > 2011-05-19 11:21:03,770 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 6 time(s). > 2011-05-19 11:21:04,771 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 7 time(s). > 2011-05-19 11:21:05,771 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 8 time(s). > 2011-05-19 11:21:06,772 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: /10.103.7.22:50020. Already tried 9 time(s). > > This set repeats multiple times for log splits. =A0 So I look around, > and set this config to be: > > =A0 > =A0 =A0hbase.client.retries.number > =A0 =A02 > =A0 =A0Maximum retries. =A0Used as maximum for all retryable > =A0 =A0operations such as fetching of the root region from root region > =A0 =A0server, getting a cell's value, starting a row update, etc. > =A0 =A0Default: 10. > =A0 =A0 > =A0 > > Unfortunately, next time server died, it made no difference. =A0Is this > a known issue for 0.89? =A0If so, was it resolved in 0.90.2? > > -Jack >