hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Xiaoqiao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading
Date Mon, 01 Jul 2019 10:32:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876088#comment-16876088

He Xiaoqiao commented on HDFS-13693:

Thanks [~leosun08],[~sinago] for the interesting work. Some nits.
1. It seems to bring a little limit for INodeDirectorySection serialization. After apply this
patch, it has to guarantee that serialize child inode by order, otherwise someone child inode
which not in order could not be found by #binarySearch. I believe the current method #serializeINodeDirectorySection
just done as said but without any limitation to update that logic. I am going to concern anyone
else want to improve it, So is it better to add some annotation or other way to guard against
this situation?
2. INodeReference could be optimized the same way?
Thanks [~leosun08] for working on this issue.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -------------------------------------------------------------------------
>                 Key: HDFS-13693
>                 URL: https://issues.apache.org/jira/browse/HDFS-13693
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: zhouyingchao
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added to their
parent INode's map one by one. The adding procedure will search a position in the parent's
map and then insert the child to the position. However, during image loading, the search is
unnecessary since the insert position should always be at the end of the map given the sequence
they are serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 300million
blocks), the image loading time be reduced from 1210 seconds to 1138 seconds.So it can reduce
up to about 10% of time.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message