hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3413) SequenceFile.Reader doesn't use the Serialization framework
Date Mon, 19 May 2008 12:37:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597935#action_12597935

Tom White commented on HADOOP-3413:

There is still clearly work to make the serialization framework fully supported in Hadoop,
but I don't think the lack of support in SequenceFile.Reader is a blocker for 0.17.0.

The work done in HADOOP-1986 enabled serialization support in the MapReduce kernel, so you
can use arbitrary types for keys and values. However, it is not yet possible to use arbitrary
types for map inputs or reduce outputs out of the box, since the support from SequenceFile{Input|Output}Format
and SequenceFileRecord{Reader|Writer} is still Writable-based as you point out. That said,
it is possible to write your own InputFormat, OutputFormat, RecordReader, RecordWriter implementations
to do this for you. For example, you can use SequenceFile.Writer#append(Object, Object) to
write any objects to a sequence file (using a Serializer) and the SequenceFile.Reader#nextRaw
methods to read bytes out to be manually deserialized using a Deserializer.

On a related note, unfortunately the RecordReader interface is incompatible with serialization
frameworks that don't reuse objects - like Java Serialization. The problem is that 

boolean next(K key, V value) throws IOException

has no way of passing keys and values that are deserialized from the stream back to the client
of the RecordReader. This is not a problem for Writables and Thrift since the client passes
in objects that are updated in-place. To fix this will require some surgery on the API.

> SequenceFile.Reader doesn't use the Serialization framework
> -----------------------------------------------------------
>                 Key: HADOOP-3413
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3413
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>            Priority: Critical
>             Fix For: 0.18.0
> Currently SequenceFile.Reader only works with Writables, since it doesn't use the new
Serialization framework. This is a glaring considering that SequenceFile.Writer uses the Serializer
and handles arbitrary types via the SerializationFactory.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message