hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5553) Change modifier of SequenceFile.CompressedBytes and SequenceFile.UncompressedBytes from private to public
Date Tue, 24 Mar 2009 04:57:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688568#action_12688568

Chris Douglas commented on HADOOP-5553:

I'll try to explain my position.

The SequenceFile.Reader instance creates the instance passed to nextRawValue. It is contrary
to its design to pass an arbitrary ValueBytes object in, as it cannot be guaranteed that it
observes the semantics the Reader expects (which is why the instance is cast to a subclass
in nextRaw\*). If exchanging ValueBytes instances between Readers is not guaranteed, then
supporting more general user code certainly should not be.

Consider how much wider the interface becomes when user code is permitted. Right now, SequenceFile.Reader
is threadsafe, because the full record is consumed in nextRaw. If a lazy reader were to block
that stream until the full value were consumed, it would introduce the possibility of deadlock
(if it didn't, its results would be undefined). Lazy vbytes A might block, while lazy vbytes
B may page to disk if there's contention. Threads might use a mix of ValueBytes instances
into the same Reader, some written by the user, others from library code. In my mind, lazily
loading ValueBytes is a worthy feature for SequenceFile.Reader, not one of a set of possible
extensions to a binary interface. Again, I can't think of a second one.

As an alternative, consider implementing {{createValueBytes(boolean lazy)}}, returning a ValueBytes
instance that lazily reads the value. This permits SequenceFile to initialize any locks/state
necessary to support this, keeps the contract contained in SequenceFile, and adds only one
parameter to an advanced interface.

> Change modifier of SequenceFile.CompressedBytes and SequenceFile.UncompressedBytes from
private to public
> ---------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-5553
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5553
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>         Attachments: Hadoop-5553-2.patch, Hadoop-5553-3.patch, Hadoop-5553.patch
> SequenceFile.rawValue() provides the only interface to navigate the underlying bytes.
And with some little work on implementing a customized ValueBytes can avoid reading all bytes
into memory. Unfortunately, the current nextRawValue will cast the passing ValueBytes to either
private class CompressedBytes or private class UnCompressedBytes, this will disallow user
further extension.
> I can not see any reason that CompressedBytes and UnCompressedBytes should be set to
private. And since the ValueBytes is public and nextValue() casts it to either CompressedBytes
or UnCompressedBytes, i think it would be better if they are public.
> I am stuck now by this issue, really appracited if this got resolved as soon as possible.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message