hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rab ra <rab...@gmail.com>
Subject running map tasks in remote node
Date Wed, 21 Aug 2013 12:09:10 GMT
Hello,

Here is the new bie question of the day.

For one of my use cases, I want to use hadoop map reduce without HDFS.
Here, I will have a text file containing a list of file names to process.
Assume that I have 10 lines (10 files to process) in the input text file
and I wish to generate 10 map tasks and execute them in parallel in 10
nodes. I started with basic tutorial on hadoop and could setup single node
hadoop cluster and successfully tested wordcount code.

Now, I took two machines A (master) and B (slave). I did the below
configuration in these machines to setup a two node cluster.

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
          <name>dfs.replication</name>
          <value>1</value>
</property>
<property>
  <name>dfs.name.dir</name>
  <value>/tmp/hadoop-bala/dfs/name</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/tmp/hadoop-bala/dfs/data</value>
</property>
<property>
     <name>mapred.job.tracker</name>
    <value>A:9001</value>
</property>

</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
            <name>mapred.job.tracker</name>
            <value>A:9001</value>
</property>
<property>
          <name>mapreduce.tasktracker.map.tasks.maximum</name>
           <value>1</value>
</property>
</configuration>

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
         <property>
                <name>fs.default.name</name>
                <value>hdfs://A:9000</value>
        </property>
</configuration>


In A and B, I do have a file named ‘slaves’ with an entry ‘B’ in it and
another file called ‘masters’ wherein an entry ‘A’ is there.

I have kept my input file at A. I see the map method process the input file
line by line but they are all processed in A. Ideally, I would expect those
processing to take place in B.

Can anyone highlight where I am going wrong?

 regards
rab

Mime
View raw message