hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11506) Configuration.get() is unnecessarily slow
Date Thu, 29 Jan 2015 22:19:36 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297807#comment-14297807

Andrew Wang commented on HADOOP-11506:

Happy to, thanks for working on this everyone. Review comments below, only one substantial
one. Core logic looks good though.

* The new recursion detection only handles a two-var loop, but the old one handles bigger
loops with the hash set. For instance, {{FOO1=FOO2, FOO2=FOO3, FOO3=FOO1}}. The behavior is
a little different thus; before, we'd return the start of the loop, here, we'd hit the IllegalStateException
at the end. I feel like the exception is arguably better, but compatibility dictates we preserve
the existing behavior.


* Not caused by your patch, but do you mind adding javadoc to substituteVars about the expected
input/output? Same for findSubVariable would be nice.
* Also good to document why we go through this dance rather than using a regex (can mention
this JIRA #)
* Did you use a return parameter for performance reasons? Eden allocation is pretty fast,
so just returning seems fine. With all the {{substring}} and string concat flying around,
we're doing a lot of allocation anyway.

Spacing is a bit off here:
      eval =   eval.substring(0, dollar)

remove "the"
    // that can occur in the nested class names

I think this should be a right brace rather than left for ultimate clarity:
      int afterRightBrace = varBounds[SUB_END_IDX] + "{".length();

Similar, I think {{"\{c"}} should be {{"c\}"}} for ultimate clarity:
         bracePos > 0 && bracePos + "{c".length() < eval.length();

> Configuration.get() is unnecessarily slow
> -----------------------------------------
>                 Key: HADOOP-11506
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11506
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Gera Shegalov
>         Attachments: HADOOP-11506.001.patch, HADOOP-11506.002.patch
> Profiling several large Hadoop jobs, we discovered that a surprising amount of time was
spent inside Configuration.get, more specifically, in regex matching caused by the substituteVars

This message was sent by Atlassian JIRA

View raw message