Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 50B1B10E1B for ; Tue, 15 Oct 2013 19:12:32 +0000 (UTC) Received: (qmail 97788 invoked by uid 500); 15 Oct 2013 19:12:18 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 97659 invoked by uid 500); 15 Oct 2013 19:12:18 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 97652 invoked by uid 99); 15 Oct 2013 19:12:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Oct 2013 19:12:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [74.125.82.174] (HELO mail-we0-f174.google.com) (74.125.82.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Oct 2013 19:12:11 +0000 Received: by mail-we0-f174.google.com with SMTP id u56so9103515wes.33 for ; Tue, 15 Oct 2013 12:11:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=vdgBxXygHq5hP3xYpn8trKUcqbcRZGX7XWBxrRBVvAQ=; b=MRdYxw0nHLzNGk6Am4IZU1MPD6RObM5F/cv9q0UXLeN9C+MWHPuIGf+gKlBYI+T41n 1jR5/2B08i7s2BFMFLpNcH4sTLUEzZxW/QsPF5q9ExPphgo2kzCXIuNw4UMKBdU960ut z0miIcmsGCiq1nUyGo2o8cSdsZZfAEe6IoxoTQAYsZNvupo+h9gQPcwMvr2zsevGv8eT k2eLx0dypVKfKrTX/7ZgVRrabzYx08+X+qKcaJMA1nsZO7KGcVhIE50VkynOsMa2l25J ILRatO5EQQs+ZmUwE7BC1iN/1WnUWPj1OKBYn5Ud3CNPrQPGRgfwxpoANL7XGTNltNHs Vsmg== X-Gm-Message-State: ALoCoQk3REKPq4ciV21yBJA/BKvM92k+WdfR/NEfViZ3GujeFnslEbS1b/du/GLuYISOkn2aJfR3 MIME-Version: 1.0 X-Received: by 10.180.83.228 with SMTP id t4mr20854873wiy.12.1381864311139; Tue, 15 Oct 2013 12:11:51 -0700 (PDT) Received: by 10.216.175.138 with HTTP; Tue, 15 Oct 2013 12:11:51 -0700 (PDT) X-Originating-IP: [199.47.72.31] In-Reply-To: References: Date: Tue, 15 Oct 2013 15:11:51 -0400 Message-ID: Subject: Re: high availability From: Koert Kuipers To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d04428ba064c11704e8cc5b1f X-Virus-Checked: Checked by ClamAV on apache.org --f46d04428ba064c11704e8cc5b1f Content-Type: text/plain; charset=ISO-8859-1 Jing, thanks for your answer. if hbase with high availability is the desired goal, is it recommended to remove sshfence? we do not plan to use hdfs for anything else. i understood that the only downside of no fencing is that the old namenode could still be serving read requests. could this negatively impact hbase functionality, or worse, could it corrupt hbase somehow (not sure how that would be...)? thanks! koert On Tue, Oct 15, 2013 at 12:38 AM, Jing Zhao wrote: > "it is unclear to me if the transition in this case is also rapid but > the fencing takes long while the new namenode is already active, or if > in this period i am stuck without an active namenode." > > The standby->active transition will get stuck in this period, i.e., > the NN can only become active after fencing the old active NN. During > this period since the only NN is in standby state which cannot handle > usual R/W operations and just throws StandbyException, hbase region > server may kill itself in some cases I guess. > > I think you can remove sshfence from the configuration if you are > using QJM-based HA. > > On Fri, Oct 11, 2013 at 4:51 PM, Koert Kuipers wrote: > > i have been playing with high availability using journalnodes and 2 > masters > > both running namenode and hbase master. > > > > when i kill the namenode and hbase-master processes on the active master, > > the failover is perfect. hbase never stops and a running map-reduce jobs > > keeps going. this is impressive! > > > > however when instead of killing the proceses i kill the entire active > master > > machine, the transactions is less smooth and can take a long time, at > least > > it seems this way in the logs. this is because ssh fencing fails but > keeps > > trying. my fencing is configured as: > > > > > > dfs.ha.fencing.methods > > > > sshfence > > shell(/bin/true) > > > > true > > > > > > it is unclear to me if the transition in this case is also rapid but the > > fencing takes long while the new namenode is already active, or if in > this > > period i am stuck without an active namenode. it is hard to accurately > test > > this in my setup. > > is this supposed to take this long? is HDFS writable in this period? and > is > > hbase supposed to survive this long transition? > > > > thanks! koert > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > --f46d04428ba064c11704e8cc5b1f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Jing,
thanks for your answer.

if hbase with hi= gh availability is the desired goal, is it recommended to remove sshfence? = we do not plan to use hdfs for anything else.

i understood that the only downside of no fencing is that the old n= amenode could still be serving read requests. could this negatively impact = hbase functionality, or worse, could it corrupt hbase somehow (not sure how= that would be...)?

thanks! koert


On Tue, Oct 15, 2013 at 12:38 AM, Jing Zhao <jing@hortonworks.com> wrote:
"it is unclear to me = if the transition in this case is also rapid but
the fencing takes long while the new namenode is already active, or if
in this period i am stuck without an active namenode."

The standby->active transition will get stuck in this period, i.e.= ,
the NN can only become active after fencing the old active NN. During
this period since the only NN is in standby state which cannot handle
usual R/W operations and just throws StandbyException, hbase region
server may kill itself in some cases I guess.

I think you can remove sshfence from the configuration if you are
using QJM-based HA.

On Fri, Oct 11, 2013 at 4:51 PM, Koert Kuipers <koert@tresata.com> wrote:
> i have been playing with high availability using journalnodes and 2 ma= sters
> both running namenode and hbase master.
>
> when i kill the namenode and hbase-master processes on the active mast= er,
> the failover is perfect. hbase never stops and a running map-reduce jo= bs
> keeps going. this is impressive!
>
> however when instead of killing the proceses i kill the entire active = master
> machine, the transactions is less smooth and can take a long time, at = least
> it seems this way in the logs. this is because ssh fencing fails but k= eeps
> trying. my fencing is configured as:
>
> =A0<property>
> =A0 =A0 <name>dfs.ha.fencing.methods</name>
> =A0 =A0 <value>
> =A0 =A0 =A0 sshfence
> =A0 =A0 =A0 shell(/bin/true)
> =A0 =A0 </value>
> =A0 =A0 <final>true</final>
> =A0 </property>
>
> it is unclear to me if the transition in this case is also rapid but t= he
> fencing takes long while the new namenode is already active, or if in = this
> period i am stuck without an active namenode. it is hard to accurately= test
> this in my setup.
> is this supposed to take this long? is HDFS writable in this period? a= nd is
> hbase supposed to survive this long transition?
>
> thanks! koert

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to=
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that=
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately=
and delete it from your system. Thank You.

--f46d04428ba064c11704e8cc5b1f--