hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: final the dfs.replication and fsck
Date Tue, 16 Oct 2012 04:25:24 GMT
Patai,

My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.

On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
<silvianhadoop@gmail.com> wrote:
> Just want to share & check if this is make sense.
>
> Job was failed to run after i restarted the namenode and the cluster
> stopped complain about under-replication.
>
> this is what i found in log file
>
> Requested replication 10 exceeds maximum 2
> java.io.IOException: file
> /tmp/hadoop-apps/mapred/staging/apps/.staging/job_201210151601_0494/job.jar.
> Requested replication 10 exceeds maximum 2
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1126)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInternal(FSNamesystem.java:1074)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:1059)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.setReplication(NameNode.java:629)
>         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:143
>
>
> So, i scanned though those xml config files, and guess to change
> <name>mapred.submit.replication</name> from 10 to 2, and restarted again.
>
> That's when jobs can start running again.
> Hopefully that change is make sense.
>
>
> Thanks
> Patai
>
> On Mon, Oct 15, 2012 at 1:57 PM, Patai Sangbutsarakum
> <silvianhadoop@gmail.com> wrote:
>> Thanks Harsh, dfs.replication.max does do the magic!!
>>
>> On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth <cnauroth@hortonworks.com> wrote:
>>> Thank you, Harsh.  I did not know about dfs.replication.max.
>>>
>>>
>>> On Mon, Oct 15, 2012 at 12:23 PM, Harsh J <harsh@cloudera.com> wrote:
>>>>
>>>> Hey Chris,
>>>>
>>>> The dfs.replication param is an exception to the <final> config
>>>> feature. If one uses the FileSystem API, one can pass in any short
>>>> value they want the replication to be. This bypasses the
>>>> configuration, and the configuration (being per-file) is also client
>>>> sided.
>>>>
>>>> The right way for an administrator to enforce a "max" replication
>>>> value at a create/setRep level, would be to set
>>>> the dfs.replication.max to a desired value at the NameNode and restart
>>>> it.
>>>>
>>>> On Tue, Oct 16, 2012 at 12:48 AM, Chris Nauroth
>>>> <cnauroth@hortonworks.com> wrote:
>>>> > Hello Patai,
>>>> >
>>>> > Has your configuration file change been copied to all nodes in the
>>>> > cluster?
>>>> >
>>>> > Are there applications connecting from outside of the cluster?  If so,
>>>> > then
>>>> > those clients could have separate configuration files or code setting
>>>> > dfs.replication (and other configuration properties).  These would not
>>>> > be
>>>> > limited by final declarations in the cluster's configuration files.
>>>> > <final>true</final> controls configuration file resource
loading, but it
>>>> > does not necessarily block different nodes or different applications
>>>> > from
>>>> > running with completely different configurations.
>>>> >
>>>> > Hope this helps,
>>>> > --Chris
>>>> >
>>>> >
>>>> > On Mon, Oct 15, 2012 at 12:01 PM, Patai Sangbutsarakum
>>>> > <silvianhadoop@gmail.com> wrote:
>>>> >>
>>>> >> Hi Hadoopers,
>>>> >>
>>>> >> I have
>>>> >> <property>
>>>> >>     <name>dfs.replication</name>
>>>> >>     <value>2</value>
>>>> >>     <final>true</final>
>>>> >>   </property>
>>>> >>
>>>> >> set in hdfs-site.xml in staging environment cluster. while the staging
>>>> >> cluster is running the code that will later be deployed in production,
>>>> >> those code is trying to have dfs.replication of 3, 10, 50, other
than
>>>> >> 2; the number that developer thought that will fit in production
>>>> >> environment.
>>>> >>
>>>> >> Even though I final the property dfs.replication in staging cluster
>>>> >> already. every time i run fsck on the staging cluster i still see
it
>>>> >> said under replication.
>>>> >> I thought final keyword will not honor value in job config, but
it
>>>> >> doesn't seem so when i run fsck.
>>>> >>
>>>> >> I am on cdh3u4.
>>>> >>
>>>> >> please suggest.
>>>> >> Patai
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>



-- 
Harsh J

Mime
View raw message