hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-120) Reading an ArrayWriter does not work because valueClass does not get initialized
Date Wed, 05 Apr 2006 05:06:08 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-120?page=comments#action_12373224 ] 

Owen O'Malley commented on HADOOP-120:

After thinking about it for a bit, the problem with this patch is that this is going to encode
the typename in each and every record. So if your value type is ArrayWriter<UTF8>, you
are going to spend an extra 2+strlen("org.apache.hadoop.io.UTF8") bytes per a record. That's
a fair amount of overhead.

We also have to be a careful with the serialization of ArrayWritable because it is used in
the DFS name node logs.

I'm not sure what the right solution is. Probably for right now, I would derive a subclass
of ArrayWritable that is specific for your type. It isn't pretty, but it is guaranteed to
be safe.

public class UTF8Array extends ArrayWritable {
  public UTF8Array() {

> Reading an ArrayWriter does not work because valueClass does not get initialized
> --------------------------------------------------------------------------------
>          Key: HADOOP-120
>          URL: http://issues.apache.org/jira/browse/HADOOP-120
>      Project: Hadoop
>         Type: Bug

>   Components: io
>  Environment: Red Hat  
>     Reporter: Dick King
>  Attachments: hadoop-120-fix.patch
> If you have a Reducer whose value type is an ArrayWriter it gets enstreamed alright but
at reconstruction type when ArrayWriter::readFields(DataInput in) runs on a DataInput that
has a nonempty ArrayWriter , newInstance fails trying to instantiate the null class.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message