hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3414) Facility to query serializable types such as Writables for 'raw length'
Date Mon, 19 May 2008 22:58:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598126#action_12598126

Doug Cutting commented on HADOOP-3414:

So we'd:
 - make Serializer an abstract class;
 - add a method:
 public int getSize() { return -1; }
 - override this in some simple classes, like Text and BytesWritable;
 - add a utility somewhere like:
public LengthPrefixedSerializer<T> extends Serializer<T> {
  private DataOutputStream out;
  private DataOutputBuffer buffer = new DataOutputBuffer();
  private Serializer<T> serializer;
  private Serializer<T> bufferSerializer;

  public LengthPrefixedSerializer<T>(Class<T> c, DataOutputStream out) {
    this.out = out;
    serializer =  SerializationFactory.getSerializer(c);
    bufferSerializer = SerializationFactory.getSerializer(c);

  public void serialize(T o) {
    int size o.getSize();
    if (size >= 0) {
      // can serialize directly w/o buffering
      WriteableUtils.writeVInt(out, size);
    } else {
      // have to buffer before we can serialize
      WriteableUtils.writeVInt(out, buffer.getLength());
      out.write(buffer.getBytes(), 0, buffer.getLength());

Is that something like what you have in mind?

> Facility to query serializable types such as Writables for 'raw length'
> -----------------------------------------------------------------------
>                 Key: HADOOP-3414
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3414
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io
>            Reporter: Arun C Murthy
> Currently we need to jump through hoops to get the 'raw length' of serializable types
for e.g. SequenceFile.Writer.append needs to copy the key/value into a buffer and then check
the buffer's size to figure the record/key/value lenghts. Obviously this can be improved to
do away with the extra copy if we had types which could be queried for it's raw-length.
> Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message