hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen O'Donnell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading
Date Mon, 22 Jul 2019 16:11:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890285#comment-16890285

Stephen O'Donnell commented on HDFS-13693:

I think this performance improvement is a great discovery, but the change does carry some
future risk, in that if something changes in how the image is loaded it would be easy to miss
this optimization. However, most changes involve some risk and this does give a decent speed
improvement so its probably worth it.

I tried this change in my testing around loading the fsimage in parallel in HDFS-14617. I
found that in the single threaded case, the load time was improved by about 35 seconds (326
to 291 seconds for just the directory section load time), but when I moved to parallel loading
(4 threads), this change had negligible impact. Probably because the work was spread out over
more threads and there are other points of serialization that slow things down.

I am happy for this to go in but thought it was worth highlighting the above.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -------------------------------------------------------------------------
>                 Key: HDFS-13693
>                 URL: https://issues.apache.org/jira/browse/HDFS-13693
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: zhouyingchao
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, HDFS-13693-003.patch,
HDFS-13693-004.patch, HDFS-13693-005.patch
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added to their
parent INode's map one by one. The adding procedure will search a position in the parent's
map and then insert the child to the position. However, during image loading, the search is
unnecessary since the insert position should always be at the end of the map given the sequence
they are serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 300million
blocks), the image loading time be reduced from 1210 seconds to 1138 seconds.So it can reduce
up to about 10% of time.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message