Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ED3E0D854 for ; Thu, 6 Dec 2012 01:20:31 +0000 (UTC) Received: (qmail 10013 invoked by uid 500); 6 Dec 2012 01:20:31 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 9952 invoked by uid 500); 6 Dec 2012 01:20:31 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 9942 invoked by uid 99); 6 Dec 2012 01:20:31 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 01:20:31 +0000 Received: from localhost (HELO mail-ob0-f169.google.com) (127.0.0.1) (smtp-auth username apurtell, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 01:20:30 +0000 Received: by mail-ob0-f169.google.com with SMTP id lz20so7196199obb.14 for ; Wed, 05 Dec 2012 17:20:29 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.164.103 with SMTP id yp7mr11454200obb.74.1354756829957; Wed, 05 Dec 2012 17:20:29 -0800 (PST) Received: by 10.60.54.97 with HTTP; Wed, 5 Dec 2012 17:20:29 -0800 (PST) In-Reply-To: References: Date: Thu, 6 Dec 2012 09:20:29 +0800 Message-ID: Subject: assignment - is master beeing a watchdog useful? From: Andrew Purtell To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=e89a8f6435569b4e3f04d024e759 --e89a8f6435569b4e3f04d024e759 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable My information here may be stale. I remember we increased the timeout interval from 3 to 30 minutes, because the master injecting itself into mid-assignment often triggered races and led to double assignments and other bad stuff. At 30 minutes, this is not useful IMO. As an operator I'd run hbck to sort it out long before then. On Thursday, December 6, 2012, Nicolas Liochon wrote: > See comments in HBASE-7247: the master checks the time spent by the > regionserver, and assign it to another if it takes too long. It adds > complexity. > > from Stack: "I'm currently of the opinion that this expensive facility of > master failing an open because it has been taking too long on a particula= r > regionserver has been of no use =96 worse, it has only caused headache = =96 but > I may be just not remembering and others out on dev list will have better > recall than I." > > So, opinions & memories are more than welcome. > Removing this feature would be a huge simplification! > > Cheers, > > Nicolas > --=20 Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --e89a8f6435569b4e3f04d024e759--