hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2423) The codes in FSDirectory.mkdirs(...) is inefficient.
Date Mon, 03 Mar 2008 23:24:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574786#action_12574786
] 

Konstantin Shvachko commented on HADOOP-2423:
---------------------------------------------

This patch actually does 2 things. 
# It introduces a class that caches INodes and local names on the path.
# Optimizes the mkDirs() using this class.

I agree both tasks are important, but if you just want to optimize mkDirs() as stated above
then it could be done much easier.
The INode.getExistingPathINodes() should be made package private rather than INode-private.
And then you can get all components of the path and all existing inodes along the path by
{code}
FSDirectory.mkDirs(src, ... ) {
    byte[][] components = INode.getPathComponents(src);
    INode[] inodes = new INode[components.length];
    rootDir.getExistingPathINodes(components, inodes);

    // create missing inodes
    ...............
}
{code}
After that you just find first null in the array of inodes, and start adding directories using
addNode() and fill in the null elements one by one.
You don't even need to start from the rootDir for every component.

Introducing the caching class is important but will take more effort, because we would probably
want to to convert all String-path 
parameters to it and use across all name-node parts. This is going to be a big change, and
it will require proper testing,
benchmarking, etc.

IMO optimizing mkDirs is important by itself, since it is extensively used by our clients.

> The codes in FSDirectory.mkdirs(...) is inefficient.
> ----------------------------------------------------
>
>                 Key: HADOOP-2423
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2423
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.15.1
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 2423_20080130.patch, 2423_20080303.patch
>
>
> FSDirectory.mkdirs(...) creates List<String> v to store all dirs.  e.g.
> {code}
> //Suppose 
> src = "/foo/bar/bas/"
> //Then,
> v = {"/", "/foo", "/foo/bar", "/foo/bar/bas"}
> {code}
> For each directory string *cur* in v, no matter *cur* already exists or not, it will
try to do a unprotectedMkdir(cur, ...).  Then, *cur* is parsed to byte[][] in INodeDirectory.addNode
(...).
> We don't need to do the parsing for each string in v.  Instead, byte[][] should be stored.
 Also, the loop should not continue once it finds an existing subdirectory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message