Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 64C36DCEA for ; Tue, 16 Oct 2012 07:28:11 +0000 (UTC) Received: (qmail 47168 invoked by uid 500); 16 Oct 2012 07:28:07 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 47097 invoked by uid 500); 16 Oct 2012 07:28:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 47074 invoked by uid 99); 16 Oct 2012 07:28:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Oct 2012 07:28:06 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of silvianhadoop@gmail.com designates 209.85.215.48 as permitted sender) Received: from [209.85.215.48] (HELO mail-la0-f48.google.com) (209.85.215.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Oct 2012 07:28:01 +0000 Received: by mail-la0-f48.google.com with SMTP id u2so4397276lag.35 for ; Tue, 16 Oct 2012 00:27:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+W/cJsyV5CQt6vglDxDWPDBAaRMelTxtW8XRoXowmG0=; b=ua1K/+pPhpPEMiho/DuEMCCzgidgPR7X2TOa7IkR8JgWRNwNlJQVjZCkD8QTrZNHUC lH5eeNT4blguj4iHe+BWIQ6yCBSVoIDp7Zst8OS0PuMnycB8tNUGZf7aXrBZDE27GNtP asSjOVAWu/YsRIu/1tvfRGQQzehnlTBxPl1rrWl7n7vBDsALQCrOmTCzBN6j8JWeuPnd uBTi7Nu5ZuVFe8RlixgZ5fwkCdemBqAcRswqs5yyQ+ffxTVlbwxfHwy8bnpondAzm6Pq x9LpVUCooMHosiNgIbDtiU1zymGMqzW0ynGwdT/lWkbhdEzC3i6Khq96StJBijMNJuMg QEpQ== MIME-Version: 1.0 Received: by 10.152.104.115 with SMTP id gd19mr12131551lab.13.1350372460114; Tue, 16 Oct 2012 00:27:40 -0700 (PDT) Received: by 10.114.75.104 with HTTP; Tue, 16 Oct 2012 00:27:40 -0700 (PDT) In-Reply-To: References: Date: Tue, 16 Oct 2012 00:27:40 -0700 Message-ID: Subject: Re: final the dfs.replication and fsck From: Patai Sangbutsarakum To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Thanks you so much for confirming that. On Mon, Oct 15, 2012 at 9:25 PM, Harsh J wrote: > Patai, > > My bad - that was on my mind but I missed noting it down on my earlier > reply. Yes you'd have to control that as well. 2 should be fine for > smaller clusters. > > On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum > wrote: >> Just want to share & check if this is make sense. >> >> Job was failed to run after i restarted the namenode and the cluster >> stopped complain about under-replication. >> >> this is what i found in log file >> >> Requested replication 10 exceeds maximum 2 >> java.io.IOException: file >> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar. >> Requested replication 10 exceeds maximum 2 >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059) >> at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629) >> at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143 >> >> >> So, i scanned though those xml config files, and guess to change >> mapred.submit.replication from 10 to 2, and restarted again. >> >> That's when jobs can start running again. >> Hopefully that change is make sense. >> >> >> Thanks >> Patai >> >> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum >> wrote: >>> Thanks Harsh, dfs.replication.max does do the magic!! >>> >>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth wrote: >>>> Thank you, Harsh. I did not know about dfs.replication.max. >>>> >>>> >>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J wrote: >>>>> >>>>> Hey Chris, >>>>> >>>>> The dfs.replication param is an exception to the config >>>>> feature. If one uses the FileSystem API, one can pass in any short >>>>> value they want the replication to be. This bypasses the >>>>> configuration, and the configuration (being per-file) is also client >>>>> sided. >>>>> >>>>> The right way for an administrator to enforce a "max" replication >>>>> value at a create/setRep level, would be to set >>>>> the dfs.replication.max to a desired value at the NameNode and restart >>>>> it. >>>>> >>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth >>>>> wrote: >>>>> > Hello Patai, >>>>> > >>>>> > Has your configuration file change been copied to all nodes in the >>>>> > cluster? >>>>> > >>>>> > Are there applications connecting from outside of the cluster? If so, >>>>> > then >>>>> > those clients could have separate configuration files or code setting >>>>> > dfs.replication (and other configuration properties). These would not >>>>> > be >>>>> > limited by final declarations in the cluster's configuration files. >>>>> > true controls configuration file resource loading, but it >>>>> > does not necessarily block different nodes or different applications >>>>> > from >>>>> > running with completely different configurations. >>>>> > >>>>> > Hope this helps, >>>>> > --Chris >>>>> > >>>>> > >>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum >>>>> > wrote: >>>>> >> >>>>> >> Hi Hadoopers, >>>>> >> >>>>> >> I have >>>>> >> >>>>> >> dfs.replication >>>>> >> 2 >>>>> >> true >>>>> >> >>>>> >> >>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging >>>>> >> cluster is running the code that will later be deployed in production, >>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other than >>>>> >> 2; the number that developer thought that will fit in production >>>>> >> environment. >>>>> >> >>>>> >> Even though I final the property dfs.replication in staging cluster >>>>> >> already. every time i run fsck on the staging cluster i still see it >>>>> >> said under replication. >>>>> >> I thought final keyword will not honor value in job config, but it >>>>> >> doesn't seem so when i run fsck. >>>>> >> >>>>> >> I am on cdh3u4. >>>>> >> >>>>> >> please suggest. >>>>> >> Patai >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Harsh J >>>> >>>> > > > > -- > Harsh J