hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9441) Do not construct path string when choosing block placement targets
Date Wed, 02 Dec 2015 01:20:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035055#comment-15035055

Tsz Wo Nicholas Sze commented on HDFS-9441:

> ... but it is a bit odd to make an argument so generic ...

It is actually a common practice.  For examples,
- String.format
- Logger.info/debug/etc
- Preconditions.checkArgument(boolean expression, Object errorMessage)

Why Object is preferred than String?  It is because Object has a toString() method which,
by definition, returns a string representation of the object.  We don't need to convert the
object to string if it is not used.

There is a distinction in my patch -- the original code uses getName() but not toString().
 So implementations need to call getName() instead of toString() if the actual parameter is
a BlockCollection.

> ... Why don't we add a new method that accepts BlockCollection instead of String and
call it wherever applicable?  ...

In this case, we have two method declarations, one with String and one with BlockCollection.
 It works but not as good as passing Object.  Why?  Suppose the parameter is used in printing
error message.  The parameter may not be used since there is no error.  Then what does the
implementation need to do? 
# It may duplicate the code and copy the method body to both methods.  Code duplication is
# It may convert BlockCollection to String and then call the method with String parameter.
 This has performance issue which is been fixed by this JIRA.
# It may declare a third method with an Object parameter.  This is the same solution as the

> Do not construct path string when choosing block placement targets
> ------------------------------------------------------------------
>                 Key: HDFS-9441
>                 URL: https://issues.apache.org/jira/browse/HDFS-9441
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>            Priority: Minor
>         Attachments: h9441_20151118.patch, h9441_20151119.patch
> - INodeFile.getName() is expensive since it involves quite a few string operations. 
The method is called in both ReplicationWork and ErasureCodingWork but the default BlockPlacementPolicy
does not use the returned string.  We should simply pass BlockCollection to reduce unnecessary
computation when using the default BlockPlacementPolicy.
> - Another improvement: the return type of FSNamesystem.getBlockCollection should be changed
to INodeFile since it always returns an INodeFile object.

This message was sent by Atlassian JIRA

View raw message