hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sriramadasu <amar...@yahoo-inc.com>
Subject Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf
Date Thu, 19 Feb 2009 03:54:38 GMT
Yes. The configuration is read only when the taskTracker starts.
You can see more discussion on jira HADOOP-5170 
(http://issues.apache.org/jira/browse/HADOOP-5170) for making it per job.
-Amareshwari
jason hadoop wrote:
> I certainly hope it changes but I am unaware that it is in the todo queue at
> present.
>
> 2009/2/18 S D <sd.codewarrior@gmail.com>
>
>   
>> Thanks Jason. That's useful information. Are you aware of plans to change
>> this so that the maximum values can be changed without restarting the
>> server?
>>
>> John
>>
>> 2009/2/18 jason hadoop <jason.hadoop@gmail.com>
>>
>>     
>>> The .maximum values are only loaded by the Tasktrackers at server start
>>> time
>>> at present, and any changes you make will be ignored.
>>>
>>>
>>> 2009/2/18 S D <sd.codewarrior@gmail.com>
>>>
>>>       
>>>> Thanks for your response Rasit. You may have missed a portion of my
>>>>         
>> post.
>>     
>>>>> On a different note, when I attempt to pass params via -D I get a
>>>>>           
>> usage
>>     
>>>> message; when I use
>>>>         
>>>>> -jobconf the command goes through (and works in the case of
>>>>>           
>>>> mapred.reduce.tasks=0 for
>>>>         
>>>>> example) but I get  a deprecation warning).
>>>>>           
>>>> I'm using Hadoop 0.19.0 and -D is not working. Are you using version
>>>>         
>>> 0.19.0
>>>       
>>>> as well?
>>>>
>>>> John
>>>>
>>>>
>>>> On Wed, Feb 18, 2009 at 9:14 AM, Rasit OZDAS <rasitozdas@gmail.com>
>>>>         
>>> wrote:
>>>       
>>>>> John, did you try -D option instead of -jobconf,
>>>>>
>>>>> I had -D option in my code, I changed it with -jobconf, this is what
>>>>>           
>> I
>>     
>>>> get:
>>>>         
>>>>> ...
>>>>> ...
>>>>> Options:
>>>>>  -input    <path>     DFS input file(s) for the Map step
>>>>>  -output   <path>     DFS output directory for the Reduce step
>>>>>  -mapper   <cmd|JavaClassName>      The streaming command to run
>>>>>  -combiner <JavaClassName> Combiner has to be a Java class
>>>>>  -reducer  <cmd|JavaClassName>      The streaming command to run
>>>>>  -file     <file>     File/dir to be shipped in the Job jar file
>>>>>  -inputformat
>>>>> TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName
>>>>> Optional.
>>>>>  -outputformat TextOutputFormat(default)|JavaClassName  Optional.
>>>>>  -partitioner JavaClassName  Optional.
>>>>>  -numReduceTasks <num>  Optional.
>>>>>  -inputreader <spec>  Optional.
>>>>>  -cmdenv   <n>=<v>    Optional. Pass env.var to streaming
commands
>>>>>  -mapdebug <path>  Optional. To run this script when a map task
fails
>>>>>  -reducedebug <path>  Optional. To run this script when a reduce
task
>>>>>           
>>>> fails
>>>>         
>>>>>  -verbose
>>>>>
>>>>> Generic options supported are
>>>>> -conf <configuration file>     specify an application configuration
>>>>>           
>>> file
>>>       
>>>>> -D <property=value>            use value for given property
>>>>> -fs <local|namenode:port>      specify a namenode
>>>>> -jt <local|jobtracker:port>    specify a job tracker
>>>>> -files <comma separated list of files>    specify comma separated
>>>>>           
>> files
>>     
>>>> to
>>>>         
>>>>> be copied to the map reduce cluster
>>>>> -libjars <comma separated list of jars>    specify comma separated
>>>>>           
>> jar
>>     
>>>>> files
>>>>> to include in the classpath.
>>>>> -archives <comma separated list of archives>    specify comma
>>>>>           
>> separated
>>     
>>>>> archives to be unarchived on the compute machines.
>>>>>
>>>>> The general command line syntax is
>>>>> bin/hadoop command [genericOptions] [commandOptions]
>>>>>
>>>>> For more details about these options:
>>>>> Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info
>>>>>
>>>>>
>>>>>
>>>>> I think -jobconf is not used in v.0.19 .
>>>>>
>>>>> 2009/2/18 S D <sd.codewarrior@gmail.com>
>>>>>
>>>>>           
>>>>>> I'm having trouble overriding the maximum number of map tasks that
>>>>>>             
>>> run
>>>       
>>>> on
>>>>         
>>>>> a
>>>>>           
>>>>>> given machine in my cluster. The default value of
>>>>>> mapred.tasktracker.map.tasks.maximum is set to 2 in
>>>>>>             
>>> hadoop-default.xml.
>>>       
>>>>>> When
>>>>>> running my job I passed
>>>>>>
>>>>>> -jobconf mapred.tasktracker.map.tasks.maximum=1
>>>>>>
>>>>>> to limit map tasks to one per machine but each machine was still
>>>>>>             
>>>>> allocated
>>>>>           
>>>>>> 2
>>>>>> map tasks (simultaneously).  The only way I was able to guarantee
a
>>>>>>             
>>>>> maximum
>>>>>           
>>>>>> of one map task per machine was to change the value of the property
>>>>>>             
>>> in
>>>       
>>>>>> hadoop-site.xml. This is unsatisfactory since I'll often be
>>>>>>             
>> changing
>>     
>>>> the
>>>>         
>>>>>> maximum on a per job basis. Any hints?
>>>>>>
>>>>>> On a different note, when I attempt to pass params via -D I get a
>>>>>>             
>>> usage
>>>       
>>>>>> message; when I use -jobconf the command goes through (and works
in
>>>>>>             
>>> the
>>>       
>>>>>> case
>>>>>> of mapred.reduce.tasks=0 for example) but I get  a deprecation
>>>>>>             
>>>> warning).
>>>>         
>>>>>> Thanks,
>>>>>> John
>>>>>>
>>>>>>             
>>>>>
>>>>> --
>>>>> M. Raşit ÖZDAŞ
>>>>>
>>>>>           
>
>   


Mime
View raw message