hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-1722) Make streaming to handle non-utf8 byte array
Date Sun, 01 Jul 2012 16:11:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404754#comment-13404754

eric baldeschwieler commented on HADOOP-1722:

This does seem like a bug.  I'd expect the combiner to ignore the output property and always
issue its output in the same format as its input.  So this shouldn't require new properties
unless I'm confused.
> Make streaming to handle non-utf8 byte array
> --------------------------------------------
>                 Key: HADOOP-1722
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1722
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Runping Qi
>            Assignee: Klaas Bosteels
>             Fix For: 1.0.2, 0.21.0
>         Attachments: HADOOP-1722-branch-0.18.patch, HADOOP-1722-branch-0.19.patch, HADOOP-1722-v0.20.1.patch,
HADOOP-1722-v2.patch, HADOOP-1722-v3.patch, HADOOP-1722-v4.patch, HADOOP-1722-v4.patch, HADOOP-1722-v5.patch,
HADOOP-1722-v6.patch, HADOOP-1722.patch
> Right now, the streaming framework expects the output sof the steam process (mapper or
reducer) are line 
> oriented UTF-8 text. This limit makes it impossible to use those programs whose outputs
may be non-UTF-8
>  (international encoding, or maybe even binary data). Streaming can overcome this limit
by introducing a simple
> encoding protocol. For example, it can allow the mapper/reducer to hexencode its keys/values,

> the framework decodes them in the Java side.
> This way, as long as the mapper/reducer executables follow this encoding protocol, 
> they can output arabitary bytearray and the streaming framework can handle them.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message