hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1140) Speedup INode.getPathComponents
Date Wed, 23 Jun 2010 00:50:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881498#action_12881498

Konstantin Shvachko commented on HDFS-1140:

Some review comment:
# {{FSImage.isParent(String, String)}} is not used, please remove.
# Could you please add separators between the methods and javaDoc descriptions for the new
methods if possible.
# {{INode.getPathFromComponents()}} should be {{DFSUtil.byteArray2String()}}.
# {{TestPathComponents}} should use junit 4 style rather than junit 3.
# I'd advise to reuse {{U_STR}} instead of allocating {{DeprecatedUTF8 buff}} directly in
In order to do that you can provide a convenience method similar to {{readString()}} or {{readBytes()}}:
static byte[][] readPathComponents(DataInputStream in) throws IOException {
  return DFSUtil.bytes2byteArray(U_STR.getBytes(), U_STR.getLength(), (byte)Path.SEPARATOR_CHAR);
The idea was to remove DeprecatedUTF8 at some point, so it is better to keep this stuff in
one place right after the declaration of U_STR.
# It does not look like {{FSDirectory.addToParent(String src ...)}} is used anywhere anymore.
Could you please verify and remove it if so.
# Same with {{INodeDirectory.addToParent(String path, ...)}} - can we eliminate it too?

> Speedup INode.getPathComponents
> -------------------------------
>                 Key: HDFS-1140
>                 URL: https://issues.apache.org/jira/browse/HDFS-1140
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>            Priority: Minor
>         Attachments: HDFS-1140.2.patch, HDFS-1140.3.patch, HDFS-1140.patch
> When the namenode is loading the image there is a significant amount of time being spent
in the DFSUtil.string2Bytes. We have a very specific workload here. The path that namenode
does getPathComponents for shares N - 1 component with the previous path this method was called
for (assuming current path has N components).
> Hence we can improve the image load time by caching the result of previous conversion.
> We thought of using some simple LRU cache for components, but the reality is, String.getBytes
gets optimized during runtime and LRU cache doesn't perform as well, however using just the
latest path components and their translation to bytes in two arrays gives quite a performance
> I could get another 20% off of the time to load the image on our cluster (30 seconds
vs 24) and I wrote a simple benchmark that tests performance with and without caching.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message