Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 55905 invoked from network); 25 Mar 2009 16:45:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Mar 2009 16:45:22 -0000 Received: (qmail 27242 invoked by uid 500); 25 Mar 2009 16:45:21 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 27168 invoked by uid 500); 25 Mar 2009 16:45:21 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 27158 invoked by uid 99); 25 Mar 2009 16:45:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2009 16:45:21 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2009 16:45:18 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A6092234C4B8 for ; Wed, 25 Mar 2009 09:44:57 -0700 (PDT) Message-ID: <2081347902.1237999497679.JavaMail.jira@brutus> Date: Wed, 25 Mar 2009 09:44:57 -0700 (PDT) From: "Hong Tang (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5553) Change modifier of SequenceFile.CompressedBytes and SequenceFile.UncompressedBytes from private to public In-Reply-To: <693703279.1237791410954.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689155#action_12689155 ] Hong Tang commented on HADOOP-5553: ----------------------------------- bq. Since SequenceFile provides a public methos nextRawValue(ValueBytes val), which accpets a ValueBytes param. And also ValueBytes is a public interface. So it seems it allows users to define their own ValueBytes implementations. But in nextRawValue(ValueBytes val), it casts the passed param into either CompressedBytes and UnCompressedBytes. I do not think it makes any sence. I think the current interface is a good example of how to hide information through the use of interface. The intention of this interface is to allow people to pass value bytes to an outputstream when people do not need to examine the deserialized object. In the meantime, it hides the implementation detail that it is implemented by eagerly reading bytes into a byte array. bq. Actually i am trying to skip some bytes of the value part of one record, and not lazy load. Is there a possibility to allow SequenceFile to extend to support this? That is a valid concern, but have you profiled your code that skipping bytes is indeed the right place you want to optimize for? Have you tried the alternative of obtaining the serialized value bytes through a ByteArrayOutputStream, and then do skipping from byte[]? Finally, a different way of achieving this goal (accessing raw bytes in read-only fashion without exposing implementation details) is to return an InputStream for the value bytes. This is the approach taken by Hadoop-3315. > Change modifier of SequenceFile.CompressedBytes and SequenceFile.UncompressedBytes from private to public > --------------------------------------------------------------------------------------------------------- > > Key: HADOOP-5553 > URL: https://issues.apache.org/jira/browse/HADOOP-5553 > Project: Hadoop Core > Issue Type: Improvement > Reporter: He Yongqiang > Attachments: Hadoop-5553-2.patch, Hadoop-5553-3.patch, Hadoop-5553.patch > > > SequenceFile.rawValue() provides the only interface to navigate the underlying bytes. And with some little work on implementing a customized ValueBytes can avoid reading all bytes into memory. Unfortunately, the current nextRawValue will cast the passing ValueBytes to either private class CompressedBytes or private class UnCompressedBytes, this will disallow user further extension. > I can not see any reason that CompressedBytes and UnCompressedBytes should be set to private. And since the ValueBytes is public and nextValue() casts it to either CompressedBytes or UnCompressedBytes, i think it would be better if they are public. > I am stuck now by this issue, really appracited if this got resolved as soon as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.