hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gokul Soundararajan <gokulsoun...@gmail.com>
Subject Re: NFSv3 Filesystem Connector
Date Thu, 15 Jan 2015 02:15:40 GMT
Hi Colin,

Yeah, I should add the reasons to the README. We tried LocalFileSystem when
we started out but we think we can do tighter Hadoop integration if we
write a connector.

Some examples include:
1. Limit over-prefetching of data - MapReduce splits the jobs into 128MB
splits and standard NFS driver tends to over-prefetch from a file. We limit
the prefetching to the split size.
2. Lazy write commits - For writes, we can relax the guarantees for writes
(and making it faster) and commit just before when the task ends.
3. Provide for location awareness - Later, we can hook some NFS smarts into
getFileBlockLocations() (Have some ideas but not implemented them yet).

Hope this helps.

Gokul



On Wed, Jan 14, 2015 at 10:47 AM, Colin McCabe <cmccabe@alumni.cmu.edu>
wrote:

> Why not just use LocalFileSystem with an NFS mount (or several)?  I read
> through the README but I didn't see that question answered anywhere.
>
> best,
> Colin
>
> On Tue, Jan 13, 2015 at 1:35 PM, Gokul Soundararajan <
> gokulsoundar@gmail.com
> > wrote:
>
> > Hi,
> >
> > We (Jingxin Feng, Xing Lin, and I) have been working on providing a
> > FileSystem implementation that allows Hadoop to utilize a NFSv3 storage
> > server as a filesystem. It leverages code from hadoop-nfs project for all
> > the request/response handling. We would like your help to add it as part
> of
> > hadoop tools (similar to the way hadoop-aws and hadoop-azure).
> >
> > In more detail, the Hadoop NFS Connector allows Apache Hadoop (2.2+) and
> > Apache Spark (1.2+) to use a NFSv3 storage server as a storage endpoint.
> > The NFS Connector can be run in two modes: (1) secondary filesystem -
> where
> > Hadoop/Spark runs using HDFS as its primary storage and can use NFS as a
> > second storage endpoint, and (2) primary filesystem - where Hadoop/Spark
> > runs entirely on a NFSv3 storage server.
> >
> > The code is written in a way such that existing applications do not have
> to
> > change. All one has to do is to copy the connector jar into the lib/
> > directory of Hadoop/Spark. Then, modify core-site.xml to provide the
> > necessary details.
> >
> > The current version can be seen at:
> > https://github.com/NetApp/NetApp-Hadoop-NFS-Connector
> >
> > It is my first time contributing to the Hadoop codebase. It would be
> great
> > if someone on the Hadoop team can guide us through this process. I'm
> > willing to make the necessary changes to integrate the code. What are the
> > next steps? Should I create a JIRA entry?
> >
> > Thanks,
> >
> > Gokul
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message