hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Creating and working with temporary file in a map() function
Date Sun, 08 Apr 2012 19:06:26 GMT
It will work. Pseudo-distributed mode shouldn't be all that different
from a fully distributed mode. Do let us know if it does not work as
intended.

On Sun, Apr 8, 2012 at 11:40 PM, Ondřej Klimpera <klimpond@fit.cvut.cz> wrote:
> Thanks for your advise, File.createTempFile() works great, at least in
> pseudo-ditributed mode, hope cluster solution will do the same work. You
> saved me hours of trying...
>
>
>
> On 04/07/2012 11:29 PM, Harsh J wrote:
>>
>> MapReduce sets "mapred.child.tmp" for all tasks to be the Task
>> Attempt's WorkingDir/tmp automatically. This also sets the
>> -Djava.io.tmpdir prop for each task at JVM boot.
>>
>> Hence you may use the regular Java API to create a temporary file:
>>
>> http://docs.oracle.com/javase/6/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String)
>>
>> These files would also be automatically deleted away after the task
>> attempt is done.
>>
>> On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera<klimpond@fit.cvut.cz>
>>  wrote:
>>>
>>> Hello,
>>>
>>> I would like to ask you if it is possible to create and work with a
>>> temporary file while in a map function.
>>>
>>> I suppose that map function is running on a single node in Hadoop
>>> cluster.
>>> So what is a safe way to create a temporary file and read from it in one
>>> map() run. If it is possible is there a size limit for the file.
>>>
>>> The file can not be created before hadoop job is created. I need to
>>> create
>>> and process the file inside map().
>>>
>>> Thanks for your answer.
>>>
>>> Ondrej Klimpera.
>>
>>
>>
>



-- 
Harsh J

Mime
View raw message