Return-Path: X-Original-To: apmail-ambari-user-archive@www.apache.org Delivered-To: apmail-ambari-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65F11108E5 for ; Wed, 11 Dec 2013 21:30:59 +0000 (UTC) Received: (qmail 47538 invoked by uid 500); 11 Dec 2013 21:30:59 -0000 Delivered-To: apmail-ambari-user-archive@ambari.apache.org Received: (qmail 47513 invoked by uid 500); 11 Dec 2013 21:30:59 -0000 Mailing-List: contact user-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ambari.apache.org Delivered-To: mailing list user@ambari.apache.org Received: (qmail 47495 invoked by uid 99); 11 Dec 2013 21:30:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 21:30:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of orenault@hortonworks.com designates 209.85.128.182 as permitted sender) Received: from [209.85.128.182] (HELO mail-ve0-f182.google.com) (209.85.128.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Dec 2013 21:30:52 +0000 Received: by mail-ve0-f182.google.com with SMTP id jy13so6577761veb.27 for ; Wed, 11 Dec 2013 13:30:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=CJQn2Den2P0alNwyJUH7HEFoCq1c88zZsytfUTLzoJI=; b=Fprtj6OZ0FLHmmVXnunsvZmKZ1bTGp82xVxUR3KwXz6GUQ5Q/nAbUexPp8zMCVlkVX aAh4MGXfrrOJVTXtmOnEuPLio5NkbsY5DKNPaBwTzfFTkVEgrrsi5qXJrSNre+ZFsqKT VyUatUbE0rl7le15HOxb8epb1YxywOSK9jrV+GEY/fvI14jeBFOUTuN1VIHpoYZe2+8l c+LN7hEFiptpWEPqIyAsn8655lQ0xld54jt8pMRIHgq/3PMkRXmZeAps6oCAH222U3Ha wKwJk2EFupptuw0SN/gipwAm3InKiO3XdTrHvEtFBP+ydGR46g9T04BQ8xl2/EVmKZAK 7nfA== X-Gm-Message-State: ALoCoQlzY25727mKTAizUuN8TBvikulCNx88QpedbPltfrJDFTQV+I0DJ/3xHP9QxqOLnGhs9KDTCpq8e7X62B5cnNOMfe1DAqXhtPbyCxy3z60KMnGUAs8= MIME-Version: 1.0 X-Received: by 10.52.164.16 with SMTP id ym16mr1264371vdb.39.1386797431431; Wed, 11 Dec 2013 13:30:31 -0800 (PST) Received: by 10.58.34.135 with HTTP; Wed, 11 Dec 2013 13:30:31 -0800 (PST) In-Reply-To: References: Date: Wed, 11 Dec 2013 22:30:31 +0100 Message-ID: Subject: Re: Journal Nodes in a multi-site environment From: Olivier Renault To: user@ambari.apache.org Cc: Suresh Srinivas , Rohit Bakhshi Content-Type: multipart/alternative; boundary=001a11c22e5846e7da04ed48f054 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c22e5846e7da04ed48f054 Content-Type: text/plain; charset=US-ASCII To get a quorum you need to have : number of QJM / 2 +1 so in case of 3 you need 2 servers to be up. If you've got 3 QJM on each side, in case of losing a DC, you will never get quorum as it will require 4 nodes. When you lose your DC with the 2 QJMs, you could manually start Hadoop on the second DC. As a side note, Hadoop is not yet recommended as a multi DC solution. Hope it helps. Olivier On 11 December 2013 21:49, Jeff Sposetti wrote: > Adding in some Hadoop folks to chime in here. > > > > On Wed, Dec 11, 2013 at 5:35 AM, Chadwick Banning < > chadwickbanning@gmail.com> wrote: > >> Hi all, >> >> I have an Ambari 1.4/HDP 2.0.6 environment that is split between two data >> centers -- a main site and a recovery site. We have NameNode HA enabled >> with automatic failover and the problem we are facing is how to divide the >> journal nodes across both sites so that failover happens appropriately. >> >> It seems that only one site will always have a majority of journal nodes >> and if that site NN were to go down, the other site NN would no longer be >> able to start as it couldn't reach a majority of the journal nodes. >> >> Is there any way around this? I know an odd number of journal nodes is >> recommended but what would happen if we were to place an even number of >> journal nodes at each site? >> >> Thanks for any input! >> > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. --001a11c22e5846e7da04ed48f054 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
To get a quorum you need to have : number of QJM / 2 +1 so= in case of 3 you need 2 servers to be up. If you've got 3 QJM on each = side, in case of losing a DC, you will never get quorum as it will require = 4 nodes. When you lose your DC with the 2 QJMs, you could manually start Ha= doop on the second DC.=A0

As a side note, Hadoop is not yet recommended as a multi DC s= olution.=A0

Hope it helps.=A0
Olivier=A0=


On 11 D= ecember 2013 21:49, Jeff Sposetti <jeff@hortonworks.com> = wrote:
Adding in some Hadoop folks= to chime in here.


On Wed, Dec 11, 2013 at 5:35 AM, Chadwick Banning <chadwickbanning= @gmail.com> wrote:
Hi all,

= I have an Ambari 1.4/HDP 2.0.6 environment that is split between two data c= enters -- a main site and a recovery site. =A0We have NameNode HA enabled w= ith automatic failover and the problem we are facing is how to divide the j= ournal nodes across both sites so that failover happens appropriately.

It seems that only one site will always have a majority= of journal nodes and if that site NN were to go down, the other site NN wo= uld no longer be able to start as it couldn't reach a majority of the j= ournal nodes.

Is there any way around this? =A0I know an odd number o= f journal nodes is recommended but what would happen if we were to place an= even number of journal nodes at each site?

Thanks= for any input!


CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u.

CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u. --001a11c22e5846e7da04ed48f054--