hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: temporary file locations for YARN applications
Date Mon, 21 Oct 2013 22:49:16 GMT
Thanks, sounds like LOCAL_DIR_ENV is the way to go.
john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Monday, October 21, 2013 12:11 PM
To: <user@hadoop.apache.org>
Subject: Re: temporary file locations for YARN applications

The dirs in that env-var are app-specific and are for the app's user to utilize. You shouldn't
have any permission issues working within them.

The LocalDirAllocator is still somewhat MR-bound but you can still be able to make it work
by giving it a config with the values it needs.

On Mon, Oct 21, 2013 at 8:49 PM, John Lilley <john.lilley@redpoint.net> wrote:
> Thanks again.  This gives me a lot of options; we will see what works.
>
> Do you know if there are any permissions issues if we directly access the folders of
LOCAL_DIR_ENV?
>
> Regarding LocalDirAllocator, I see its constructor: LocalDirAllocator(String contextCfgItemName)
and a note mentioning that an example of this item is "mapred.local.dir".  Is that the correct
usage, or is there something YARN-generic?
>
> Cheers,
> john
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Sunday, October 20, 2013 11:58 PM
> To: <user@hadoop.apache.org>
> Subject: Re: temporary file locations for YARN applications
>
> Hi,
>
> MR does use multiple disks when spilling. But the work directory is also round-robined
to spread I/O.
>
> YARN sets an environment property thats a list (comma separated value) 
> of directories (ApplicationConstants.LOCAL_DIR_ENV) your app container 
> can together use. Perhaps read it in with 
> StringUtils.getTrimmedStrings(System.getenv(ApplicationConstants.LOCAL
> _DIR_ENV)); and then round robin internally over those paths (with 
> free space handling)?
>
> Perhaps you can even reuse the org.apache.hadoop.fs.LocalDirAllocator
> class; which is what MR uses. Its not been declared publicly stable though, but we can
do that over a JIRA.
>
> On Mon, Oct 21, 2013 at 2:05 AM, John Lilley <john.lilley@redpoint.net> wrote:
>> Harsh, thanks for the quick response.  These files don't need to be on the DFS (although
we use that too).  These are local files used during sorting, joining, transitive closure.
>>
>> The task-relative folder might be good enough, but our app *can* make use of multiple
temp folders if they are available.  Our YARN app can be fairly I/O intensive; is it possible
to allocate more than one temp folder on different physical devices?
>>
>> Or perhaps YARN might help us. Will YARN assign tasks to CWD folders on different
disks so that they do not compete with each other on I/O?
>>
>> For that matter, where does MR allocate the temporary files generated by Mapper output?
 Presumably MR has the same I/O parallelism requirements that we do.
>>
>> Thanks
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Sunday, October 20, 2013 10:49 AM
>> To: <user@hadoop.apache.org>
>> Subject: Re: temporary file locations for YARN applications
>>
>> Every container gets its own local work directory (You can use the relative ./) thats
auto-cleaned up at the end of the container's life.
>> This is the best place to store the temporary files. This is not something you need
custom configuration for.
>>
>> Do the files need to be on a distributed FS or a local one?
>>
>> On Sun, Oct 20, 2013 at 8:54 PM, John Lilley <john.lilley@redpoint.net> wrote:
>>> We have a pure YARN application (no MapReduce) that has need to 
>>> store a significant amount of temporary data.  How can we know the 
>>> best location for these files?  How can we ensure that our YARN 
>>> tasks have write access to these locations?  Is this something that must be configured
outside of YARN?
>>> Thanks,
>>> John
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

Mime
View raw message