hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: temporary file locations for YARN applications
Date Sun, 20 Oct 2013 20:35:15 GMT
Harsh, thanks for the quick response.  These files don't need to be on the DFS (although we
use that too).  These are local files used during sorting, joining, transitive closure.  

The task-relative folder might be good enough, but our app *can* make use of multiple temp
folders if they are available.  Our YARN app can be fairly I/O intensive; is it possible to
allocate more than one temp folder on different physical devices?  

Or perhaps YARN might help us. Will YARN assign tasks to CWD folders on different disks so
that they do not compete with each other on I/O?  

For that matter, where does MR allocate the temporary files generated by Mapper output?  Presumably
MR has the same I/O parallelism requirements that we do.


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, October 20, 2013 10:49 AM
To: <user@hadoop.apache.org>
Subject: Re: temporary file locations for YARN applications

Every container gets its own local work directory (You can use the relative ./) thats auto-cleaned
up at the end of the container's life.
This is the best place to store the temporary files. This is not something you need custom
configuration for.

Do the files need to be on a distributed FS or a local one?

On Sun, Oct 20, 2013 at 8:54 PM, John Lilley <john.lilley@redpoint.net> wrote:
> We have a pure YARN application (no MapReduce) that has need to store 
> a significant amount of temporary data.  How can we know the best 
> location for these files?  How can we ensure that our YARN tasks have 
> write access to these locations?  Is this something that must be configured outside of
> Thanks,
> John

Harsh J

View raw message