hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Powell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5751) Remove the FsDatasetSpi and FsVolumeSpi interfaces
Date Mon, 13 Jan 2014 21:59:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870043#comment-13870043

David Powell commented on HDFS-5751:

Alas, I am intimately familiar with the reimplementation necessary, and wish there was less
of it to do and to maintain.  That said, precluding alternate implementations because creating
one would require more than the ideal amount of work feels like throwing the baby out with
the bathwater.

Moving the abstraction lower is along the lines of what I had in mind when I suggested the
middle ground of changes that reduce mainline maintenance burden while preserving a usable
interface for others.  I think the lower surface of the official FsDatasetImpl is far too
low, however, and that comparing HDFS with ext3fs is both underestimating the complexity and
modularity of HDFS and overestimating the versatility of the simple interface a traditional
filesystem consumes.  Which is to say, I think there is a class of problems which would lead
one to replace a traditional filesystem entirely, but could be solved much more elegantly
in HDFS given its components' architectural separation.

> Remove the FsDatasetSpi and FsVolumeSpi interfaces
> --------------------------------------------------
>                 Key: HDFS-5751
>                 URL: https://issues.apache.org/jira/browse/HDFS-5751
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, test
>    Affects Versions: 3.0.0
>            Reporter: Arpit Agarwal
> The in-memory block map and disk interface portions of the DataNode have been abstracted
out into an {{FsDatasetpSpi}} interface, which further uses {{FsVolumeSpi}} to represent individual
> The abstraction is useful as it allows DataNode tests to use a {{SimulatedFSDataset}}
which does not write any data to disk. Instead it just stores block metadata in memory and
returns zeroes for all reads. This is useful for both unit testing and for simulating arbitrarily
large datanodes without having to provision real disk capacity.
> A 'real' DataNode uses {{FsDataSetImpl}}. Both {{FsDatasetImpl}} and {{SimulatedFSDataset}}
implement {{FsDatasetSpi}}.
> However there are a few problems with this approach:
> # Using the factory class significantly complicates the code flow for the common case.
This makes the code harder to understand and debug.
> # There is additional burden of maintaining two different dataset implementations.
> # Fidelity between the two implementations is poor.
> Instead we can eliminate the SPIs and just hide the disk read/write routines with a dependency
injection framework like Google Guice.

This message was sent by Atlassian JIRA

View raw message