hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: running map tasks in remote node
Date Fri, 23 Aug 2013 12:51:49 GMT
You say:
"Each map process gets a line. The map process will then do a file transfer
and process it.  "

What file, from where to where is being transferred in the map? Are you
sure that the mappers are not complaining about 'this' file access? Because
this seem to be separate from the initial data input that each mapper gets
(basically your understanding "map method contains contents of the input
file")

Regards,
Shahab


On Fri, Aug 23, 2013 at 6:13 AM, rab ra <rabmdu@gmail.com> wrote:

> Thanks for the reply.
>
> I am basically exploring possible ways to work with hadoop framework for
> one of my use case. I have my limitations in using hdfs but agree with the
> fact that using map reduce in conjunction with hdfs makes sense.
>
> I successfully tested wholeFileInputFormat by some googling.
>
> Now, coming to my use case. I would like to keep some files in my master
> node and want to do some processing in the cloud nodes. The policy does not
> allow us to configure and use cloud nodes as HDFS.  However, I would like
> to span a map process in those nodes. Hence, I set input path as local file
> system, for example, $HOME/inputs. I have a file listing filenames (10
> lines) in this input directory.  I use NLineInputFormat and span 10 map
> process. Each map process gets a line. The map process will then do a file
> transfer and process it.  However, I get an error in the map saying that
> the FileNotFoundException $HOME/inputs. I am sure this directory is present
> in my master but not in the slave nodes. When I copy this input directory
> to slave nodes, it works fine. I am not able to figure out how to fix this
> and the reason for the error. I am not understand why it complains about
> the input directory is not present. As far as I know, slave nodes get a map
> and map method contains contents of the input file. This should be fine for
> the map logic to work.
>
>
> with regards
> rabmdu
>
>
>
>
> On Thu, Aug 22, 2013 at 4:40 PM, java8964 java8964 <java8964@hotmail.com>wrote:
>
>> If you don't plan to use HDFS, what kind of sharing file system you are
>> going to use between cluster? NFS?
>> For what you want to do, even though it doesn't make too much sense, but
>> you need to the first problem as the shared file system.
>>
>> Second, if you want to process the files file by file, instead of block
>> by block in HDFS, then you need to use the WholeFileInputFormat (google
>> this how to write one). So you don't need a file to list all the files to
>> be processed, just put them into one folder in the sharing file system,
>> then send this folder to your MR job. In this way, as long as each node can
>> access it through some file system URL, each file will be processed in each
>> mapper.
>>
>> Yong
>>
>> ------------------------------
>> Date: Wed, 21 Aug 2013 17:39:10 +0530
>> Subject: running map tasks in remote node
>> From: rabmdu@gmail.com
>> To: user@hadoop.apache.org
>>
>>
>> Hello,
>>
>>  Here is the new bie question of the day.
>>
>> For one of my use cases, I want to use hadoop map reduce without HDFS.
>> Here, I will have a text file containing a list of file names to process.
>> Assume that I have 10 lines (10 files to process) in the input text file
>> and I wish to generate 10 map tasks and execute them in parallel in 10
>> nodes. I started with basic tutorial on hadoop and could setup single node
>> hadoop cluster and successfully tested wordcount code.
>>
>> Now, I took two machines A (master) and B (slave). I did the below
>> configuration in these machines to setup a two node cluster.
>>
>> hdfs-site.xml
>>
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> <!-- Put site-specific property overrides in this file. -->
>> <configuration>
>> <property>
>>           <name>dfs.replication</name>
>>           <value>1</value>
>> </property>
>> <property>
>>   <name>dfs.name.dir</name>
>>   <value>/tmp/hadoop-bala/dfs/name</value>
>> </property>
>> <property>
>>   <name>dfs.data.dir</name>
>>   <value>/tmp/hadoop-bala/dfs/data</value>
>> </property>
>> <property>
>>      <name>mapred.job.tracker</name>
>>     <value>A:9001</value>
>> </property>
>>
>> </configuration>
>>
>> mapred-site.xml
>>
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <!-- Put site-specific property overrides in this file. -->
>>
>> <configuration>
>> <property>
>>             <name>mapred.job.tracker</name>
>>             <value>A:9001</value>
>> </property>
>> <property>
>>           <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>            <value>1</value>
>> </property>
>> </configuration>
>>
>> core-site.xml
>>
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> <!-- Put site-specific property overrides in this file. -->
>> <configuration>
>>          <property>
>>                 <name>fs.default.name</name>
>>                 <value>hdfs://A:9000</value>
>>         </property>
>> </configuration>
>>
>>
>> In A and B, I do have a file named ‘slaves’ with an entry ‘B’ in it and
>> another file called ‘masters’ wherein an entry ‘A’ is there.
>>
>> I have kept my input file at A. I see the map method process the input
>> file line by line but they are all processed in A. Ideally, I would expect
>> those processing to take place in B.
>>
>> Can anyone highlight where I am going wrong?
>>
>>  regards
>> rab
>>
>
>

Mime
View raw message