Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 1 Feb 2017 01:06:51 +0000 (UTC)
From: "Daniel Templeton (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13038215.1485454102000.4659.1485911211757@Atlassian.JIRA>
In-Reply-To: <JIRA.13038215.1485454102000@Atlassian.JIRA>
References: <JIRA.13038215.1485454102000@Atlassian.JIRA> <JIRA.13038215.1485454102108@jira-lw-us.apache.org>
Subject: [jira] [Commented] (YARN-6125) The application attempt's diagnostic
 message should have a maximum size
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 01 Feb 2017 01:07:05 -0000


    [ https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847836#comment-15847836 ] 

Daniel Templeton commented on YARN-6125:
----------------------------------------

Thanks for the patch.  Some comments:

* In the property, let's use "limit" instead of "capacity."  It sounds firmer. :)
* Instead of {{DEFAULT_APP_ATTEMPT_DIAGNOSTICS_CAPACITY_BYTES}}, I think {{DEFAULT_APP_ATTEMPT_DIAGNOSTICS_CAPACITY_KB}} might be better.  Then you can default to 64 instead of 65536.
* I can't comment on the POM change.  Anyone else want to comment?
* {{Lists.newLinkedList()}} is a Java 6 thing.  Just use the {{LinkedList}} constructor directly.
* I don't think we need the default constructor.
* You should do parameter validation in {{RMAppAttemptImpl}} before passing the capacity to the constructor.  You can offer a more informative error message that way.
* {{ensureNull()}} should probably be {{ensureNotNull()}}.
* I don't see any good reason to have {{inputLength}} be final.
* Your checks at the beginning of {{cutAtLeast()}}, {{checkAndCut()}}, and append will bring down the RM if they're violated.  That's bad.  You should probably log and ignore instead, probably at the level of the RM, instead of in the appender.  Except for {{checkAndCut()}}, where you should either trim down the new message to fit or allow the oversized message to be appended.  Probably the former.
* Your last append should probably just call the first append, and the first one should probably call the second one.
* Just leave out the javadoc on the overridden methods instead of stubbing it out.
* The description in {{yarn-default.xml}}, the description should maybe be "Defines the maximum capacity of the diagnostic message for each application attempt, in bytes.  When using ZooKeeper to store application state behavior, it's important to limit the size of the diagnostic messages to prevent YARN from overwhelming ZooKeeper.  In cases where yarn.resourcemanager.state-store.max-completed-applications is set to a large number, it may be desirable to reduce the value of this property to limit the total data stored."
* No need for the empty {{setup()}} in the test.
* {{initWithPositiveCapacitySuccess()}} should assert something. :)


> The application attempt's diagnostic message should have a maximum size
> -----------------------------------------------------------------------
>
>                 Key: YARN-6125
>                 URL: https://issues.apache.org/jira/browse/YARN-6125
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Daniel Templeton
>            Assignee: Andras Piros
>            Priority: Critical
>             Fix For: 3.0.0-alpha3
>
>         Attachments: YARN-6125.000.patch, YARN-6125.001.patch, YARN-6125.002.patch
>
>
> We've found through experience that the diagnostic message can grow unbounded.  I've seen attempts that have diagnostic messages over 1MB.  Since the message is stored in the state store, it's a bad idea to allow the message to grow unbounded.  Instead, there should be a property that sets a maximum size on the message.
> I suspect that some of the ZK state store issues we've seen in the past were due to the size of the diagnostic messages and not to the size of the classpath, as is the current prevailing opinion.
> An open question is how best to prune the message once it grows too large.  Should we
> # truncate the tail,
> # truncate the head,
> # truncate the middle,
> # add another property to make the behavior selectable, or
> # none of the above?


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org