hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Rajah (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-11905) Abstraction for LocalDirAllocator
Date Sun, 03 May 2015 01:49:13 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-11905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kannan Rajah updated HADOOP-11905:
----------------------------------
    Attachment: 0001-Abstraction-for-local-disk-path-allocation.patch

> Abstraction for LocalDirAllocator
> ---------------------------------
>
>                 Key: HADOOP-11905
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11905
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.5.2
>            Reporter: Kannan Rajah
>            Assignee: Kannan Rajah
>             Fix For: 2.7.1
>
>         Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
>
>
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not HDFS. The proposal
is to provide a new abstraction that encapsulates the above 2 abstractions and hides who manages
the local disks. This enables us to provide an alternate implementation where a DFS can manage
the local disks and it can be accessed using HDFS APIs. This means the DFS maintains a namespace
for node local directories and can create paths that are guaranteed to be present on a specific
node.
> Here is an example use case for Shuffle: When a mapper writes intermediate data using
this new implementation, it will continue write to local disk. When a reducer needs to access
data from a remote node, it can use HDFS APIs with a path that points to that node’s local
namespace instead of having to use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator. So we just
need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific node. It uses
Configuration object to get user configured base directory and appends the node hostname to
it. Hence the returned paths are within the node local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message