Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A399A200C1C for ; Wed, 1 Feb 2017 02:07:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A22D4160B5F; Wed, 1 Feb 2017 01:07:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EC9C9160B52 for ; Wed, 1 Feb 2017 02:07:04 +0100 (CET) Received: (qmail 38888 invoked by uid 500); 1 Feb 2017 01:07:04 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 38877 invoked by uid 99); 1 Feb 2017 01:07:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Feb 2017 01:07:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9EC431A0310 for ; Wed, 1 Feb 2017 01:07:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id W6ruH3HUOl-Y for ; Wed, 1 Feb 2017 01:07:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3181F5FCD0 for ; Wed, 1 Feb 2017 01:07:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6FACAE043B for ; Wed, 1 Feb 2017 01:06:52 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B95E625290 for ; Wed, 1 Feb 2017 01:06:51 +0000 (UTC) Date: Wed, 1 Feb 2017 01:06:51 +0000 (UTC) From: "Daniel Templeton (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-6125) The application attempt's diagnostic message should have a maximum size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Feb 2017 01:07:05 -0000 [ https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847836#comment-15847836 ] Daniel Templeton commented on YARN-6125: ---------------------------------------- Thanks for the patch. Some comments: * In the property, let's use "limit" instead of "capacity." It sounds firmer. :) * Instead of {{DEFAULT_APP_ATTEMPT_DIAGNOSTICS_CAPACITY_BYTES}}, I think {{DEFAULT_APP_ATTEMPT_DIAGNOSTICS_CAPACITY_KB}} might be better. Then you can default to 64 instead of 65536. * I can't comment on the POM change. Anyone else want to comment? * {{Lists.newLinkedList()}} is a Java 6 thing. Just use the {{LinkedList}} constructor directly. * I don't think we need the default constructor. * You should do parameter validation in {{RMAppAttemptImpl}} before passing the capacity to the constructor. You can offer a more informative error message that way. * {{ensureNull()}} should probably be {{ensureNotNull()}}. * I don't see any good reason to have {{inputLength}} be final. * Your checks at the beginning of {{cutAtLeast()}}, {{checkAndCut()}}, and append will bring down the RM if they're violated. That's bad. You should probably log and ignore instead, probably at the level of the RM, instead of in the appender. Except for {{checkAndCut()}}, where you should either trim down the new message to fit or allow the oversized message to be appended. Probably the former. * Your last append should probably just call the first append, and the first one should probably call the second one. * Just leave out the javadoc on the overridden methods instead of stubbing it out. * The description in {{yarn-default.xml}}, the description should maybe be "Defines the maximum capacity of the diagnostic message for each application attempt, in bytes. When using ZooKeeper to store application state behavior, it's important to limit the size of the diagnostic messages to prevent YARN from overwhelming ZooKeeper. In cases where yarn.resourcemanager.state-store.max-completed-applications is set to a large number, it may be desirable to reduce the value of this property to limit the total data stored." * No need for the empty {{setup()}} in the test. * {{initWithPositiveCapacitySuccess()}} should assert something. :) > The application attempt's diagnostic message should have a maximum size > ----------------------------------------------------------------------- > > Key: YARN-6125 > URL: https://issues.apache.org/jira/browse/YARN-6125 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Affects Versions: 2.7.0 > Reporter: Daniel Templeton > Assignee: Andras Piros > Priority: Critical > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6125.000.patch, YARN-6125.001.patch, YARN-6125.002.patch > > > We've found through experience that the diagnostic message can grow unbounded. I've seen attempts that have diagnostic messages over 1MB. Since the message is stored in the state store, it's a bad idea to allow the message to grow unbounded. Instead, there should be a property that sets a maximum size on the message. > I suspect that some of the ZK state store issues we've seen in the past were due to the size of the diagnostic messages and not to the size of the classpath, as is the current prevailing opinion. > An open question is how best to prune the message once it grows too large. Should we > # truncate the tail, > # truncate the head, > # truncate the middle, > # add another property to make the behavior selectable, or > # none of the above? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org