Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18E7EE94B for ; Fri, 22 Feb 2013 21:56:13 +0000 (UTC) Received: (qmail 24246 invoked by uid 500); 22 Feb 2013 21:56:12 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 24214 invoked by uid 500); 22 Feb 2013 21:56:12 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 24205 invoked by uid 99); 22 Feb 2013 21:56:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 21:56:12 +0000 Date: Fri, 22 Feb 2013 21:56:12 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4235) when outputting XML, OfflineEditsViewer can't handle some edits containing non-ASCII strings MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4235?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13584= 742#comment-13584742 ]=20 Colin Patrick McCabe commented on HDFS-4235: -------------------------------------------- Wikipedia has a good listing of unicode restrictions in XML here: http://en.wikipedia.org/wiki/Valid_characters_in_XML =20 > when outputting XML, OfflineEditsViewer can't handle some edits containin= g non-ASCII strings > -------------------------------------------------------------------------= ------------------- > > Key: HDFS-4235 > URL: https://issues.apache.org/jira/browse/HDFS-4235 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Minor > Attachments: HDFS-4235.001.patch > > > It seems that when outputting XML, OfflineEditsViewer can't handle some e= dits containing non-ASCII strings. > Example: > {code} > cmccabe@keter:/h> ./bin/hdfs oev -i ~/Downloads/current2/edits -o /tmp/u.= xml =20 > 17:11:24,662 ERROR OfflineEditsBinaryLoader:82 - Got IOException at posit= ion 10593 > Encountered exception. Exiting: SAX error: The character '=EF=BF=BD' is a= n invalid XML character > java.io.IOException: SAX error: The character '=EF=BF=BD' is an invalid X= ML character > at org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisito= r.visitOp(XmlEditsVisitor.java:119) > at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBi= naryLoader.loadEdits(OfflineEditsBinaryLoader.java:78) > at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsVi= ewer.go(OfflineEditsViewer.java:142) > at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsVi= ewer.run(OfflineEditsViewer.java:228) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsVi= ewer.main(OfflineEditsViewer.java:237) > {code} > Probably, we forgot to properly escape and/or re-encode a filename before= putting it into the XML. The other processors (stats, binary) don't have = this problem, so it is purely an XML encoding issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira