avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-6) Better support for using customized in memory types with Avro GenericDatumReader and GenericDatumWriter
Date Sun, 12 Apr 2009 07:43:14 GMT

    [ https://issues.apache.org/jira/browse/AVRO-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698181#action_12698181
] 

Hong Tang commented on AVRO-6:
------------------------------

With the patch, here is how I may override a few methods to provide a DatumReader/Writer for
PIG Tuple:

Tuple DatumWriter:
{code}
  @Override
  protected Object getField(Object record, String field, int position) {
    try {
      return ((Tuple) record).get(position);
    } catch (ExecException e) {
      throw new RuntimeException("Error getting datum in tuple ", e);
    }
  }

  @Override
  protected long getArraySize(Object array) {
    return ((DataBag) array).size();
  }

  @Override
  protected Iterator<? extends Object> getArrayElements(Object array) {
    return ((DataBag) array).iterator();
  }

  @Override
  protected boolean isString(Object datum) {
    return datum instanceof String;
  }

  @Override
  protected boolean isBytes(Object datum) {
    return datum instanceof DataByteArray;
  }

  @Override
  protected boolean isRecord(Object datum) {
    return datum instanceof Tuple;
  }

  @Override
  protected boolean isArray(Object datum) {
    return datum instanceof DataBag;
  }

  @Override
  protected void writeString(Object datum, ValueWriter out) throws IOException {
    out.writeUtf8(new Utf8((String) datum));
  }

  @Override
  protected void writeBytes(Object datum, ValueWriter out) throws IOException {
    out.writeBuffer(ByteBuffer.wrap(((DataByteArray) datum).get()));
  }
{code}

Tuple DatumReader:
{code}
  @Override
  protected void addField(Object record, String name, int position, Object o) {
    try {
      ((Tuple) record).set(position, o);
    } catch (ExecException e) {
      throw new RuntimeException("Error setting datum in tuple ", e);
    }
  }

  @Override
  protected Object getField(Object record, String name, int position) {
    try {
      return ((Tuple) record).get(position);
    } catch (ExecException e) {
      throw new RuntimeException("Error getting datum in tuple ", e);
    }
  }

  @Override
  protected void removeField(Object record, String field, int position) {
    try {
      ((Tuple) record).set(position, null);
    } catch (ExecException e) {
      throw new RuntimeException("Error setting datum in tuple ", e);
    }
  }
  
  @Override
  protected Object peekArray(Object array) {
    return null;
  }

  @Override
  protected void addToArray(Object array, Object e) {
    ((DataBag) array).add((Tuple) e);
  }

  @Override
  protected Object newRecord(Object old, Schema schema) {
    if ((old != null) && ((Tuple) old).size() == schema.getFields().size()) {
      return old;
    }

    Tuple retv = new DefaultTuple(); // TODO: change to use tuple factory
    for (int i = 0; i < schema.getFields().size(); ++i) {
      retv.append(null);
    }
    return retv;
  }
  
  @Override
  protected Object newArray(Object old, int size) {
    if (old != null) {
      ((DataBag) old).clear();
      return old;
    }
    return new DefaultDataBag(); // TODO: change to use bag factory
  }

  @Override
  protected Object readString(Object old, ValueReader in) throws IOException {
    return in.readUtf8(old);
  }

  @Override
  protected DataByteArray readBytes(Object old, ValueReader in)
      throws IOException {
    ByteBuffer bb =
        in.readBuffer((old == null) ? null : ByteBuffer
            .wrap(((DataByteArray) old).get()));
    return new DataByteArray(bb.array(), 0, bb.limit());
  }
{code}

> Better support for using customized in memory types with Avro GenericDatumReader and
GenericDatumWriter
> -------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-6
>                 URL: https://issues.apache.org/jira/browse/AVRO-6
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.0
>            Reporter: Hong Tang
>             Fix For: 1.0
>
>         Attachments: avro-6.patch
>
>
> Currently Avro's GenericDatumReader/Writer requires Record, Array, and Map be subclasses
of GenericRecord, GenericArray, and Map. Additionally, STRING and BYTES are mapped to Utf8
and ByteBuffer. Finally, Record fields are accessed through field names, this may be less
efficient if a user-defined record class supports field access by positions (such as PIG Tuples).
> I suggest we improve the interface to (1) have more flexibility to use user-types with
Avro; (2) support access to RECORDs by either field names or field positions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message