Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 48872 invoked from network); 12 Aug 2008 08:45:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Aug 2008 08:45:11 -0000 Received: (qmail 16897 invoked by uid 500); 12 Aug 2008 08:45:04 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 16887 invoked by uid 500); 12 Aug 2008 08:45:04 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 16876 invoked by uid 99); 12 Aug 2008 08:45:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2008 01:45:04 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2008 08:44:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 85472234C1A8 for ; Tue, 12 Aug 2008 01:44:44 -0700 (PDT) Message-ID: <1231741939.1218530684544.JavaMail.jira@brutus> Date: Tue, 12 Aug 2008 01:44:44 -0700 (PDT) From: "Amar Kamat (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-2403) JobHistory log files contain data that cannot be parsed by org.apache.hadoop.mapred.JobHistory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621734#action_12621734 ] Amar Kamat commented on HADOOP-2403: ------------------------------------ I think we should fix the general problem to do with history parsing which are 1) Detect if the record is complete or not. The client can fail while writing to the history and the failure can be exactly on the key-val boundary. 2) Detect if the key-val pairs are correct. The error message can contain tabs and other characters like {{"}} which can error the history parsing. Currently a tab is the delimiter for records and a {{"}} is used for value encapsulation. Similarly other strings in the history can have these characters like counter-names etc. This problems can fail HADOOP-3245. I would go for having a record _delimiter_ like a {{.}}(dot) to detect if the record is complete or not. Also incomplete records should not be parsed and should be ignored. We also need to make sure that the characters that are used as delimiter ({{.}}, {{"}}, tab) should not occur in a _value_. ---- Thoughts? > JobHistory log files contain data that cannot be parsed by org.apache.hadoop.mapred.JobHistory > ---------------------------------------------------------------------------------------------- > > Key: HADOOP-2403 > URL: https://issues.apache.org/jira/browse/HADOOP-2403 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > Assignee: Amareshwari Sriramadasu > Priority: Critical > Fix For: 0.19.0 > > Attachments: EncodeDecode.java, patch-2403.txt > > > When some tasks failed, the job tracker writes an line to the history file with error message. > However, the error message may mess up with the history file format, choking the history parser. Here is an example: > MapAttempt TASK_TYPE="MAP" TASKID="tip_200712102254_0001_m_000090" TASK_ATTEMPT_ID="task_200712102254_0001_m_000090_0" TASK_STATUS="FAILED" FINISH_TIME="1197327293253" HOSTNAME="XXXX:50050" ERROR="java.lang.IllegalArgumentException: Trouble to get key or value (<,> substituted by null > . Key XML-Ori: > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.