Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B8B5E52F for ; Tue, 15 Jan 2013 22:04:14 +0000 (UTC) Received: (qmail 63110 invoked by uid 500); 15 Jan 2013 22:04:13 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 63068 invoked by uid 500); 15 Jan 2013 22:04:13 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 63059 invoked by uid 99); 15 Jan 2013 22:04:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jan 2013 22:04:13 +0000 Date: Tue, 15 Jan 2013 22:04:13 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554423#comment-13554423 ] Todd Lipcon commented on HDFS-4403: ----------------------------------- bq. The old client code just called getCrcType() without checking hasCrcType(). The old clients talking to new server with this change, get null pointer exception, if the new server does not set the crcType That's not how defaults work in protobuf -- the protobuf maintains a separate flag for "hasFieldX", which is set to false upon construction even if the field has a default. So, on the server side, if an optional field isn't explicitly set, it won't serialize the default of that field to the wire when it's serialized. That is to say, given that the field has always been optional, the case of a server not setting it will be handled the same regardless of whether or not it has a default. To illustrate, I made the following test proto: {code} message MyTestProto { optional string testField = 1 [ default = "hello world" ]; } {code} And ran the following code: {code} MyTestProto pb = MyTestProto.newBuilder().build(); System.err.println("has field? " + pb.hasTestField()); System.err.println("field value: " + pb.getTestField()); System.err.println("serialized: " + StringUtils.byteToHexString(pb.toByteArray())); {code} Output: {code} has field? false field value: hello world serialized: {code} Note how the default value isn't put on the "wire" when the protobuf is serialized to a byte array. So, in your example, with the old clients talking to a new server which doesn't set crc type, the old clients would continue to use whatever default they'd defined locally. > DFSClient can infer checksum type when not provided by reading first byte > ------------------------------------------------------------------------- > > Key: HDFS-4403 > URL: https://issues.apache.org/jira/browse/HDFS-4403 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.0.2-alpha > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Minor > Attachments: hdfs-4403.txt > > > HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the new protobuf field is optional, with a default of CRC32. This means that this API, when used against an older cluster (like earlier 0.23 releases) will falsely return CRC32 even if that cluster has written files with CRC32C. This can cause issues for distcp, for example. > Instead of defaulting the protobuf field to CRC32, we can leave it with no default, and if the OpBlockChecksumResponseProto has no checksum type set, the client can send OP_READ_BLOCK to read the first byte of the block, then grab the checksum type out of that response (which has always been present) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira