hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingkei Ly (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5589) TupleWritable: Lift implicit limit on the number of values that can be stored
Date Wed, 15 Apr 2009 21:26:14 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699395#action_12699395
] 

Jingkei Ly commented on HADOOP-5589:
------------------------------------

The patch could be backwards-compatible if the bitset was written to the stream as VLongs
(essentially what had been implemented in HADOOP-5589-2.patch, minus the bug with sparse bitsets),
as the bytes written to the stream would be exactly the same in both implementations as long
as there were less than 64 values.

However, because we can't read an old TupleWritable containing over 64 values without throwing
an EOFException, it won't be "fully" backwardly-compatible. 

While I would be tempted to argue that TupleWritable never supported over 64 values in a tuple
anyway, is there still a need to support users who were storing tuples over 64 values but
with incorrect results? 



> TupleWritable: Lift implicit limit on the number of values that can be stored
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5589
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5589
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Jingkei Ly
>            Assignee: Jingkei Ly
>         Attachments: HADOOP-5589-1.patch, HADOOP-5589-2.patch, HADOOP-5589-3.patch
>
>
> TupleWritable uses an instance field of the primitive type, long, which I presume is
so that it can quickly determine if a position has been written to in its array of Writables
(by using bit-shifting operations on the long field). The problem with this is that it implies
that there is a maximum limit of 64 values you can store in a TupleWritable.
> An example of a use-case where I think this would be a problem is if you had two MR jobs
with over 64 reduces tasks and you wanted to join the outputs with CompositeInputFormat  -
this will probably cause unexpected results in the current scheme.
> At the very least, the 64-value limit should be documented in TupleWritable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message