hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Virajith Jalaparti (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9809) Abstract implementation-specific details from the datanode
Date Fri, 18 Mar 2016 23:07:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202333#comment-15202333

Virajith Jalaparti commented on HDFS-9809:

The motivation behind this JIRA is HDFS-9806 where data can be stored in remote filesystems
and datanodes will not hold the block data in local files. The current implementation of the
Datanode assumes that block data is always located in {{java.io.File}} (e.g., {{FsVolumeSpi#getBasePath}}).
This JIRA aims to constrain this assumption to the classes that directly access/read/write
the block data ({{FsVolumeImpl}}, and {{ReplicaInfo}}). This will enable us to minimize the
changes to the datanode in HDFS-9806 – for example, checks that the actual block data is
not stored as a {{java.io.File}} but at a remote URI can be constrained to within {{FsVolumeImpl}}
and don’t have to added to parts of the datanode which access or can potentially access
Below we list the reasons behind the changes (in the patch submitted) to different classes
in the datanode. 

h4. ReplicaInfo
The {{java.io.File}} related APIs in {{ReplicaInfo}} ({{getBlockFile}}, {{getMetaFile}}) are
moved to a subclass of {{ReplicaInfo}} called {{LocalReplica}}. The classes {{FinalizedReplica}},
{{ReplicaInPipeline}}, {{ReplicaUnderRecovery}}, and {{ReplicaWaitingToBeRecovered}} are changed
to be subclasses of {{LocalReplica}} instead of {{ReplicaInfo}}. The motivation behind this
change is that we can have {{ReplicaInfo}} s that point to blocks located in remote stores
and as a result don’t have associated {{java.io.File}} s. 
We added various functions to {{ReplicaInfo}} in order to replace the calls to {{ReplicaInfo#getBlockFile}},
and {{ReplicaInfo#getMetaFile}} in the rest of the code. 

h4. FsVolumeSpi and StorageLocation
Instead of associating an FsVolume with a base path (which is a {{java.io.File}}), we associate
it with a {{StorageLocation}}. This allows us to remove the dependence on {{java.io.File}}
and replace it with the more general one which can point to a {{java.io.File}} or an abstract
{{URI}} representing an external storage. Using {{StorageLocation}} instead of defining a
new type for location allows us to reuse its functionality and plug into the rest of the code
easily. Following this intuition, we replaced {{FsVolumeSpi#getBasePath}} with {{FsVolumeSpi#getStorageLocation}}.
As a result, comparisons and references to FsVolumes which were done using the {{java.io.File}}
returned by {{FsVolumeSpi#getBasePath}} are now replaced by comparisons and references to
the {{StorageLocation}} returned by {{FsVolumeSpi#getStorageLocation}}. 

Extending this further, we attempted to make the following changes to the Datanode: (a) associate
{{StorageDirectory}} with {{StorageLocation}}, instead of {{java.io.File}} (replacing calls
to {{StorageDirectory#getRoot}} by {{StorageDirectory#getStorageLocation}}) and (b) remove
references to {{StorageLocation#getFile}}. 

h4. DirectoryScanner.ReportCompiler
The {{DirectoryScanner.ReportCompiler}} calls on {{FsVolumeSpi#getFinalizedDir}} and compiles
the report assuming that this returns a {{java.io.File}}. However, in HDFS-9806, data may
not be stored in files. Further, the {{DirectoryScanner.ReportCompiler#compileReport}} function
assumes the way blocks are stored in FsVolumes which can be different for different {{FsVolumeSpi}}
implementations. To address these assumptions and to allow the details of how volumes implement
their storage, we moved the {{ReportCompiler#compileReport}} function as one of those implemented
by {{FsVolumeSpi}}. 

h4. FsDatasetImpl
Currently, functions in {{FsDatasetImpl}} that create new {{ReplicaInfo}} objects (under different
states RUR, Temporary, RBW etc. as part of the data pipeline) all contain the assumption that
blocks are associated with java.io.Files. To remove this dependency, we moved these functions
into {{FsVolumeImpl}}. This provides the flexibility for the {{FsVolumeImpl}} to handle {{ReplicaInfo}}
s as it sees fit. In particular, if a certain {{FsVolumeImpl}} uses external storage to store
block data, it can perform these functions appropriately. 

> Abstract implementation-specific details from the datanode
> ----------------------------------------------------------
>                 Key: HDFS-9809
>                 URL: https://issues.apache.org/jira/browse/HDFS-9809
>             Project: Hadoop HDFS
>          Issue Type: Task
>            Reporter: Virajith Jalaparti
> Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) implicitly
assume that blocks are stored in java.io.File(s) and that volumes are divided into directories.
We propose to abstract these details, which would help in supporting other storages. 

This message was sent by Atlassian JIRA

View raw message