hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1687) HDFS Federation: DirectoryScanner changes for federation
Date Tue, 01 Mar 2011 23:24:36 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matt Foley updated HDFS-1687:
-----------------------------

    Attachment: HDFS-1687_DirScan_v1.patch

test-patch results:

+1 @author.  The patch does not contain any @author tags.
+1 tests included.  The patch appears to include 3 new or modified tests.
+1 javadoc.  The javadoc tool did not generate any warning messages.
+1 javac.  The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
-1 release audit.  The applied patch generated 99 release audit warnings (more than the trunk's
current 98 warnings).
+1 system test framework.  The patch passed system test framework compile.

Couldn't tell where the one "extra" release audit warning came from, as none of the 99 affected
files were files I had changed, and the only .java files warned were in the "src/contrib/thriftfs"
directories.


> HDFS Federation: DirectoryScanner changes for federation
> --------------------------------------------------------
>
>                 Key: HDFS-1687
>                 URL: https://issues.apache.org/jira/browse/HDFS-1687
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: Federation Branch
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>             Fix For: Federation Branch
>
>         Attachments: HDFS-1687_DirScan_v1.patch
>
>
> DirectoryScanner scans substantially all of the directory tree of entire volumes.  It
needs to be extended to work with Blockpools in Federation.  
> Design notes:
> 1. The subdirectories of active bpid's will be scanned.  Active bpid's are those associated
with currently connected Namenodes.  Each Volume knows the set of all active bpid's, via volume.map.keySet().
 I'll add a package-private accessor in FSVolume to return the set of active bpid's for use
by DirectoryScanner, DataBlockScanner, etc.  DirectoryScanner will ignore inactive bpid's
subdirectories; see item below.  
> 2. There is no need to compare the volume set of active bpid's with the global set, because
the way the code works, they really can't be different.  If differences arise, they will be
automatically fixed by the next restart of either the Datanode or the Namenode.
> 3. Inactive bpid's will be ignored.  Until we are connected to the owner Namenode, we
cannot know whether a bpid subdirectory is correctly formatted, has snapshot data, etc.  So
it doesn't make sense to try to manage the data under an inactive bpid.
> 4. DirectoryScanner is currently instantiated and periodically triggered by DataBlockScanner.
 Other than both being "scanners", these two modules have little in common, and the triggering
code is confusing.  (DirectoryScanner scans filesystem directory trees every hour, to detect
and fix inconsistencies between disk directories and ReplicasMap.  DataBlockScanner runs every
3 weeks, and traverses all block files, actually reading them out and checksumming them to
detect block corruption.)
> Separating them, and running DirectoryScanner under its own periodic scheduler, is a
small change that will make the code much clearer.  It already runs on its own FixedThreadPool
Executor, so it is easy to change it to a ScheduledThreadPool, and instantiate it from DataNode.postStartInit()
at the same time as initBlockScanner() is called.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message