hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepak Diwakar" <ddeepa...@gmail.com>
Subject Re: parallel hadoop process reading same input file
Date Fri, 29 Aug 2008 10:49:42 GMT
 my good luck. i resolved the problem. To run more than one map task you
need to have different hadoop directory
then go to /hadoop-home/conf. and copy the following property from
hadoop-default.xml:
<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>

and paste into file hadoop-site.xml and change the value field
different-different for different-different hadoop directory. Then there
would not be any conflict while keeping the intermediate files for the
different-different map tasks.

Thanks
Deepak,

2008/8/29 Deepak Diwakar <ddeepak4u@gmail.com>

> I am running  two different hadoop map/reduce task in standalone mode on
> single node which read same folder. I found that Task1 was not  able to
> processed those file which have  been processed by Task2 and vice-versa. It
> gave some IO error. It seems that in standalone mode  while processing the
> file map task usually locks the file internally (Hoping that should not  be
> the case in DFS mode)
>
> One more observation I found that two map task can't be run on single task
> tracker or single node simultaneously(even if you setup two different hadoop
> directory and try to run map task from both places) . Possible reason I
> could think for is " Hadoop stores its intermediate map /reduce task output
> into some file format in /tmp/ folder. Hence if we run two map task
> simultaneously then it finds conflict keep the intermediate files at the
> same location and results error.
>
> This is my interpretation.
>
> Any feasible solution are appreciable for the standalone mode.
>
> Thanks
> Deepak
>
>
>
> 2008/8/28 lohit <lohit_bv@yahoo.com>
>
> Hi Deepak,
>> Can you explain what process and what files they are trying to read? If
>> you are talking about map/reduce tasks reading files on DFS, then, yes
>> parallel reads are allowed. Multiple writers are not.
>> -Lohit
>>
>>
>>
>> ----- Original Message ----
>> From: Deepak Diwakar <ddeepak4u@gmail.com>
>> To: core-user@hadoop.apache.org
>> Sent: Thursday, August 28, 2008 6:06:58 AM
>> Subject: parallel hadoop process reading same input file
>>
>> Hi,
>>
>> When I am running two hadoop processes in parallel and both process has to
>> read same file. It fails.
>> Of course one solution is to keep copy of file into different location so
>> that accessing simultaneously would not cause any problem. But what if we
>> don't want to do so because it costs extra space.
>> Plz do suggest me any suitable solution to this.
>>
>> Thanks & Regards,
>> Deepak
>>
>>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message