hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Molkov (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1140) Speedup INode.getPathComponents
Date Mon, 10 May 2010 22:02:30 GMT
Speedup INode.getPathComponents

                 Key: HDFS-1140
                 URL: https://issues.apache.org/jira/browse/HDFS-1140
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Dmytro Molkov

When the namenode is loading the image there is a significant amount of time being spent in
the DFSUtil.string2Bytes. We have a very specific workload here. The path that namenode does
getPathComponents for shares N - 1 component with the previous path this method was called
for (assuming current path has N components).
Hence we can improve the image load time by caching the result of previous conversion.
We thought of using some simple LRU cache for components, but the reality is, String.getBytes
gets optimized during runtime and LRU cache doesn't perform as well, however using just the
latest path components and their translation to bytes in two arrays gives quite a performance
I could get another 20% off of the time to load the image on our cluster (30 seconds vs 24)
and I wrote a simple benchmark that tests performance with and without caching.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message