Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EC0DF111DA for ; Mon, 24 Mar 2014 15:02:13 +0000 (UTC) Received: (qmail 30077 invoked by uid 500); 24 Mar 2014 15:02:12 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 28782 invoked by uid 500); 24 Mar 2014 15:02:07 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 27746 invoked by uid 99); 24 Mar 2014 15:02:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Mar 2014 15:02:04 +0000 Date: Mon, 24 Mar 2014 15:02:04 +0000 (UTC) From: "Mit Desai (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1670) aggregated log writer can write more log data then it says is the log length MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945188#comment-13945188 ] Mit Desai commented on YARN-1670: --------------------------------- I realize that I created the patch based on the trunk before the commit of the earlier patch so it fails. I will upload a new one. [~jeagles] # Nice logic. This is much easier to understand. I will incorporate your suggestion in the new change. # For the buffer size, you are correct. I already did some analysis on that. I read some discussions/articles online which say that 64K buffer size performs efficiently. > aggregated log writer can write more log data then it says is the log length > ---------------------------------------------------------------------------- > > Key: YARN-1670 > URL: https://issues.apache.org/jira/browse/YARN-1670 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.0.0, 0.23.10, 2.2.0 > Reporter: Thomas Graves > Assignee: Mit Desai > Priority: Critical > Fix For: 2.4.0 > > Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch > > > We have seen exceptions when using 'yarn logs' to read log files. > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) > at java.lang.Long.parseLong(Long.java:483) > at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) > at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) > at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) > at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) > We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. > Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. > We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. > We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. > while (len != -1 && curRead < fileLength) { > This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)