hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1722) Make streaming to handle non-utf8 byte array
Date Thu, 05 Feb 2009 17:41:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670833#action_12670833
] 

Owen O'Malley commented on HADOOP-1722:
---------------------------------------

I thought about that, but since streaming is primarily used by developers who don't want to
write Java, I think the usability is better if we just have a set of enums/strings that we
map into classes in streaming.

"typed.bytes" -> TypedBytesMRInputWriter / TypedBytesMROutputProcessor
"text" -> TextMRInputWriter / TextMROutputProcessor

You might consider renaming the classes to something like StreamingInputWriter and StreamingOutputReader,
which are more symmetric with each other.

if someone implemented my suggestion above, it could be called "backquoted" or something.


By the way, you of course could use a different identifier string, like "typedBytes" or "typed_bytes".


> Make streaming to handle non-utf8 byte array
> --------------------------------------------
>
>                 Key: HADOOP-1722
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1722
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>            Assignee: Klaas Bosteels
>         Attachments: HADOOP-1722-v2.patch, HADOOP-1722-v3.patch, HADOOP-1722-v4.patch,
HADOOP-1722-v4.patch, HADOOP-1722.patch
>
>
> Right now, the streaming framework expects the output sof the steam process (mapper or
reducer) are line 
> oriented UTF-8 text. This limit makes it impossible to use those programs whose outputs
may be non-UTF-8
>  (international encoding, or maybe even binary data). Streaming can overcome this limit
by introducing a simple
> encoding protocol. For example, it can allow the mapper/reducer to hexencode its keys/values,

> the framework decodes them in the Java side.
> This way, as long as the mapper/reducer executables follow this encoding protocol, 
> they can output arabitary bytearray and the streaming framework can handle them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message