hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepak Diwakar" <ddeepa...@gmail.com>
Subject Re: parallel hadoop process reading same input file
Date Fri, 29 Aug 2008 10:49:42 GMT
 my good luck. i resolved the problem. To run more than one map task you
need to have different hadoop directory
then go to /hadoop-home/conf. and copy the following property from
  <description>A base for other temporary directories.</description>

and paste into file hadoop-site.xml and change the value field
different-different for different-different hadoop directory. Then there
would not be any conflict while keeping the intermediate files for the
different-different map tasks.


2008/8/29 Deepak Diwakar <ddeepak4u@gmail.com>

> I am running  two different hadoop map/reduce task in standalone mode on
> single node which read same folder. I found that Task1 was not  able to
> processed those file which have  been processed by Task2 and vice-versa. It
> gave some IO error. It seems that in standalone mode  while processing the
> file map task usually locks the file internally (Hoping that should not  be
> the case in DFS mode)
> One more observation I found that two map task can't be run on single task
> tracker or single node simultaneously(even if you setup two different hadoop
> directory and try to run map task from both places) . Possible reason I
> could think for is " Hadoop stores its intermediate map /reduce task output
> into some file format in /tmp/ folder. Hence if we run two map task
> simultaneously then it finds conflict keep the intermediate files at the
> same location and results error.
> This is my interpretation.
> Any feasible solution are appreciable for the standalone mode.
> Thanks
> Deepak
> 2008/8/28 lohit <lohit_bv@yahoo.com>
> Hi Deepak,
>> Can you explain what process and what files they are trying to read? If
>> you are talking about map/reduce tasks reading files on DFS, then, yes
>> parallel reads are allowed. Multiple writers are not.
>> -Lohit
>> ----- Original Message ----
>> From: Deepak Diwakar <ddeepak4u@gmail.com>
>> To: core-user@hadoop.apache.org
>> Sent: Thursday, August 28, 2008 6:06:58 AM
>> Subject: parallel hadoop process reading same input file
>> Hi,
>> When I am running two hadoop processes in parallel and both process has to
>> read same file. It fails.
>> Of course one solution is to keep copy of file into different location so
>> that accessing simultaneously would not cause any problem. But what if we
>> don't want to do so because it costs extra space.
>> Plz do suggest me any suitable solution to this.
>> Thanks & Regards,
>> Deepak

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message