Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 84941913E for ; Sat, 18 Aug 2012 08:29:43 +0000 (UTC) Received: (qmail 37049 invoked by uid 500); 18 Aug 2012 08:29:40 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 36240 invoked by uid 500); 18 Aug 2012 08:29:39 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 36174 invoked by uid 99); 18 Aug 2012 08:29:38 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Aug 2012 08:29:38 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 2349F2C5BE3 for ; Sat, 18 Aug 2012 08:29:38 +0000 (UTC) Date: Sat, 18 Aug 2012 19:29:38 +1100 (NCT) From: "Kihwal Lee (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <1490631964.26367.1345278578145.JavaMail.jiratomcat@arcas> In-Reply-To: <798628559.1099.1333384165780.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-8239) Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437271#comment-13437271 ] Kihwal Lee commented on HADOOP-8239: ------------------------------------ I think XML is fine. XML parsing is done at the document level, so we can safely find out or ignore the existence of the extra parameter and not worry about the size of data. I tried calling getFileChecksum() over Hftp between a patched 0.23 cluster and a 1.0.x cluster, and it worked fine both ways. The change you suggested does not solve the whole problem. The magic number is like a simple binary length field. Presence/absence of it tells you how much data you need to read. So the read-side of patched version works even when reading from an unpatched version. But it's not true for the other way around. The unpatched version will always leave something unread in the stream. XML is nice in that it inherently has begin and end marker and not sensitive to size changes. Since JsonUtil depends on this serialization/deserialization methods I don't think it cannot obtain the bidirectional compatibility by modifying only one side. If it had used XML and did not do the length check, it would have no such problem. Fully Json-ized approach could have worked as well. One approach I can think of is to leave the current readFields()/write() methods unchanged. I think only WebHdfs is using it and if that is true, we can make WebHdfs actually send and receive everything in JSON format and keep the current "bytes" Json field as is. When it does not find the "new" fields from an old data source, it can do the old deserialization on "bytes". Similarly, it should send everything in individual JSON field as well as the old serialzed "bytes". It may be better to move the JSON util methods to MD5MD5CRC32FileChecksum.java, since they will have to know the internals of MD5MD5CRC32FileChecksum. > Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used > -------------------------------------------------------------------------- > > Key: HADOOP-8239 > URL: https://issues.apache.org/jira/browse/HADOOP-8239 > Project: Hadoop Common > Issue Type: Improvement > Components: fs > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Fix For: 2.1.0-alpha > > Attachments: hadoop-8239-after-hadoop-8240.patch.txt, hadoop-8239-before-hadoop-8240.patch.txt > > > In order to support HADOOP-8060, MD5MD5CRC32FileChecksum needs to be extended to carry the information on the actual checksum type being used. The interoperability between the extended version and branch-1 should be guaranteed when Filesystem.getFileChecksum() is called over hftp, webhdfs or httpfs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira