Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59928928A for ; Tue, 19 Jun 2012 15:27:48 +0000 (UTC) Received: (qmail 18489 invoked by uid 500); 19 Jun 2012 15:27:44 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 18405 invoked by uid 500); 19 Jun 2012 15:27:44 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 18362 invoked by uid 99); 19 Jun 2012 15:27:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jun 2012 15:27:44 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.210.48 as permitted sender) Received: from [209.85.210.48] (HELO mail-pz0-f48.google.com) (209.85.210.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jun 2012 15:27:39 +0000 Received: by dadz8 with SMTP id z8so10333588dad.35 for ; Tue, 19 Jun 2012 08:27:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=UzAi27Ep4rKfXrhLLQwkbjD69TrJyWTr8HvNFhoVAqs=; b=S9Mviu+T+ggz5KCWaGKknQ/LAGGxqQOjuku+huohvGyLrEf9FqGR8/nGWdD5XEjFO+ TzElOM0Bnviowzj+FRNZa/SgIONxNpENsjxmAmiTMHi0H5XaKpz7qJNQQ6Iu4w4EDQtZ KDd1i5ezQcowetTKBFsI5egizc/ttK+HpJZLajSPdLFpLy8G20TuYzbaOWFjTKgXR8Tz ArlZbS2km7Nz/vTMMrlNosD24wgLi4j0nlay5oVV/IIUPVRUsx4bBDkcl5d5UrjsskP+ nssZSzVpdBxj3uOCEg/D01lmOWT6gDOOi8dGw6nhO/RiNAZFHbrEWTCWQT59qnVQUIeh FIrw== Received: by 10.68.201.7 with SMTP id jw7mr23627910pbc.60.1340119638422; Tue, 19 Jun 2012 08:27:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.130.201 with HTTP; Tue, 19 Jun 2012 08:26:58 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Tue, 19 Jun 2012 20:56:58 +0530 Message-ID: Subject: Re: Split brain - is it possible in hadoop? To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnzjqdjTmj/HPNEhOUiTmCFB6LIN1lk/6u5gs+ydiEiHJL1CrK+vP3GfuTINCdyR5bGR+kY X-Virus-Checked: Checked by ClamAV on apache.org If your case isn't the HA NameNode feature from Apache Hadoop 2.0, then there can't be a split-brain situation in your not-exactly-failover solution, unless your VIP is also messed up between nodes (i.e. Some clients/DNs resolve the NN hostname as A and the others as B). One way this is already prevented from causing harm is that a NameNode automatically waits for proper/complete block reports (from all DNs, such at at least all blocks have at least one replica location listed) before it is fully functional and only then exits safemode. This way, I do not see clients functioning (namespace edits aren't functional during safemode) and hence no damage done. Note: When you have such VIP-based setups, you will also need to ensure that before an NN auto-starts itself, it does check properly for the others' existence. You don't want two NNs running different image copies, be it a problem or not - its gonna lead to more confusion anyway. On Tue, Jun 19, 2012 at 8:36 PM, hdev ml wrote: > Hello Michael, thanks for responding. At the bottom of the email, I have > given the following scenario. This is my understanding of split brain and I > am trying to simulate it, which is where I am getting problems. > > My understanding is that split brain happens because of timeouts on the > main namenode. The way it happens is, when the timeout occurs, the HA > implementation - Be it Linux HA, Veritas etc., thinks that the main > namenode has died and tries to start the standby namenode. The standby > namenode starts up and then main namenode comes back from the timeout phase > and starts functioning as if nothing happened, giving rise to 2 namenodes > in the cluster - Split Brain. > > On Tue, Jun 19, 2012 at 5:47 AM, Michael Segel wrote: > >> In your example, you only have one active Name Node. So how would you >> encounter a 'split brain' scenario? >> Maybe it would be better if you defined what you mean by a split brain? >> >> -Mike >> >> On Jun 18, 2012, at 8:30 PM, hdev ml wrote: >> >> > All hadoop contributors/experts, >> > >> > I am trying to simulate split brain in our installation. There are a few >> > things we want to know >> > >> > 1. Does data corruption happen? >> > 2. If Yes in #1, how to recover from it. >> > 3. What are the corrective steps to take in this situation e.g. killing >> one >> > namenode etc >> > >> > So to simulate this I took following steps. >> > >> > 1. We already have a healthy test cluster, consisting of 4 machines. One >> > machine runs namenode and a datanode, other machine runs >> secondarynamenode >> > and a datanode, 3rd runs jobtracker and a datanode, and 4th one just a >> > datanode. >> > 2. Copied the hadoop installation folder to a new location in the >> datanode. >> > 3. Kept all configurations same in hdfs-site and core-site xmls, except >> > renamed the fs.default.name to a different URI >> > 4. The namenode directory - dfs.name.dir was pointing to the same shared >> > NFS mounted directory to which the main namenode points to. >> > >> > I started this standby namenode using following command >> > bin/hadoop-daemon.sh --config conf --hosts slaves start namenode >> > >> > It errored out saying that "the directory is already locked", which is an >> > expected behaviour. The directory has been locked by the original >> namenode. >> > >> > So I changed the dfs.name.dir to some other folder, and issued the same >> > command. It fails with message - "namenode has not been formatted", which >> > is also expected. >> > >> > This makes me think - does splitbrain situation really occur in hadoop? >> > >> > My understanding is that split brain happens because of timeouts on the >> > main namenode. The way it happens is, when the timeout occurs, the HA >> > implementation - Be it Linux HA, Veritas etc., thinks that the main >> > namenode has died and tries to start the standby namenode. The standby >> > namenode starts up and then main namenode comes back from the timeout >> phase >> > and starts functioning as if nothing happened, giving rise to 2 namenodes >> > in the cluster - Split Brain. >> > >> > Considering the error messages and the above understanding, I cannot >> point >> > 2 different namenodes to same directory, because the main namenode isn't >> > responding but has locked the directory. >> > >> > So can I safely conclude that split brain does not occur in hadoop? >> > >> > Or am I missing any other situation where split brain happens and the >> > namenode directory is not locked, thus allowing the standby namenode also >> > to start up? >> > >> > Has anybody encountered this? >> > >> > Any help is really appreciated. >> > >> > Harshad >> >> -- Harsh J