hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3636) Abstraction for LocalDirAllocator
Date Tue, 12 May 2015 22:34:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540910#comment-14540910
] 

Vinod Kumar Vavilapalli commented on YARN-3636:
-----------------------------------------------

If this is only for shuffle, I agree, we need an abstraction in Shuffle instead of creating
a generic interface for LocalDirAllocator.

> Abstraction for LocalDirAllocator
> ---------------------------------
>
>                 Key: YARN-3636
>                 URL: https://issues.apache.org/jira/browse/YARN-3636
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.5.2
>            Reporter: Kannan Rajah
>              Labels: BB2015-05-TBR
>         Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
>
>
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not HDFS. The proposal
is to provide a new abstraction that encapsulates the above 2 abstractions and hides who manages
the local disks. This enables us to provide an alternate implementation where a DFS can manage
the local disks and it can be accessed using HDFS APIs. This means the DFS maintains a namespace
for node local directories and can create paths that are guaranteed to be present on a specific
node.
> Here is an example use case for Shuffle: When a mapper writes intermediate data using
this new implementation, it will continue write to local disk. When a reducer needs to access
data from a remote node, it can use HDFS APIs with a path that points to that node’s local
namespace instead of having to use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator. So we just
need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific node. It uses
Configuration object to get user configured base directory and appends the node hostname to
it. Hence the returned paths are within the node local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message