hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: execute multiple MR jobs
Date Thu, 19 Nov 2009 05:29:57 GMT
JobClient (.18) / Job(.20) class apis should help you achieve this.


On 11/19/09 1:40 AM, "Gang Luo" <lgpublic@yahoo.com.cn> wrote:

HI all,
I am going to execute multiple mapreduce jobs in sequence, but whether or not to execute a
job in that sequence could not be determined beforehand, but depend on the result of the previous
job. Is there anyone with some ideas how to do this 'dynamically"?

p.s. I guess cascading could help. I still not got of point of cascading yet. It is appreciated
if someone could give me some hints on this.

Gang Luo
Department of Computer Science
Duke University

----- 原始邮件 ----
发件人: Edward Capriolo <edlinuxguru@gmail.com>
收件人: common-user@hadoop.apache.org
发送日期: 2009/11/18 (周三) 1:02:35 下午
主   题: Re: names or ips in rack awareness script?

On Wed, Nov 18, 2009 at 11:28 AM, Michael Thomas <thomas@hep.caltech.edu> wrote:
> IPs are passed to the rack awareness script. 燱e use 'dig' to do the reverse
> lookup to find the hostname, as we also embed the rack id in the worker node
> hostnames.
> --Mike
> On 11/18/2009 08:20 AM, David J. O'Dell wrote:
>> I'm trying to figure out if I should use ip addresses or dns names in my
>> rack awareness script.
>> Its easier for me to use dns names because we have the row and rack
>> number in the name which means I can dynamically determine the rack
>> without having to manually update the list when adding nodes.
>> However this won't work if the script is passed ips as arguments.
>> Does anyone know what is being passed on to the script(ip's or dns names)
>> Relevant docs:
>> http://hadoop.apache.org/common/docs/r0.20.1/cluster_setup.html#Hadoop+Rack+Awareness
>> and
>> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/net/DNSToSwitchMapping.html#resolve(java.util.List)

It was never clear to me what would be needed ip vs hostname. I
specified ip, short hostnames, and long hostnames just to be safe. And
you know things sometimes change with hadoop ::wink-wink::

I have been meaning to plug my topology script for a while (as I think
it is pretty cool). I separated my topology script and my topology
data like so..


while [ $# -gt 0 ] ; do
  exec< ${HADOOP_CONF}/topology.data
  while read line ; do
    ar=( $line )
    if [ "${ar[0]}" = "$nodeArg" ] ; then
  if [ -z "$result" ] ; then
    echo -n "/default-rack "
    echo -n "$result "

hadoopdata1.ec.com     /dc1/rack1
hadoopdata1                   /dc1/rack1                       /dc1/rack1

It is great if your hostname reflects the rackname in some parsable
format! Then you do not need to maintain a topology data file like I
have. As of now I generate it from our asset db.

Good luck!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message