hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer
Date Thu, 24 Jul 2014 08:17:38 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072976#comment-14072976
] 

Todd Lipcon commented on MAPREDUCE-6000:
----------------------------------------

I chatted offline with Sean a bit about this one, and we realized there are a couple issues:
- he made the point that, even if we don't internally use all of the methods, users might
expect a DataOutput/DataInput implementation to be fully functional.
- I hadn't realized that DataInput/DataOutput use "modified UTF-8", and not actually UTF-8.
I added a new test case with a unicode cat face and found that my new implementation was incompatible
with Java's implementation, since I was using "real UTF8" and not "modified".

But, we still need to do something like this patch, since the current code actually appears
to copy-paste stuff from the Oracle JDK, and thus isn't Apache-license-compatible. So, I'll
upload a new rev of the patch with the following changes since the previous one:

- add a test case with unicode cat face (it's outside the "basic multilingual plane" so tends
to expose bugs like the above)
- instead of trying to implement the more complex methods, the reader/writer classes now have
an internal java.io.Data{Input,Output}Stream instance. We delegate to these for any of the
more substantial methods, so that we don't have to duplicate any encoding/decoding code. This
is safe since those classes don't do any of their own buffering.

> native-task: Simplify ByteBufferDataReader/Writer
> -------------------------------------------------
>
>                 Key: MAPREDUCE-6000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-6000.txt, mapreduce-6000.txt
>
>
> The ByteBufferDataReader and ByteBufferDataWriter class are more complex than necessary:
> - several methods related to reading/writing strings and char arrays are implemented
but never used by the native task code. Given that the use case for these classes is limited
to serializing binary data to/from the native code, it seems unlikely people will want to
use these methods in any performance-critical space. So, let's do simpler implementations
that are less likely to be buggy, even if they're slightly less performant.
> - methods like readLine() are even less likely to be used. Since it's a complex implementation,
let's just throw UnsupportedOperationException
> - in the test case, we can use Mockito to shorten the amount of new code



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message