hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "arkady borkovsky (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1722) Make streaming to handle non-utf8 byte array
Date Wed, 29 Aug 2007 17:48:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523638
] 

arkady borkovsky commented on HADOOP-1722:
------------------------------------------

Passing data from from DFS to streaming mapper should be transparent,
By default, the mapper task should receive the exactly the same bytes as stored in DFS without
any transformation.
There should  also be command line parameters that specify other useful options, including
custom input format, decompressions, etc.
There should be no requirements on the command that is used as Streaming Mapper.

This has been broken twice -- in Sept. 2006, and in July 2007.
It would be nice to restore the functionality, and make it part of specification.  (This implies
adding regression cases, etc.)

> Make streaming to handle non-utf8 byte array
> --------------------------------------------
>
>                 Key: HADOOP-1722
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1722
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Runping Qi
>
> Right now, the streaming framework expects the output sof the steam process (mapper or
reducer) are line 
> oriented UTF-8 text. This limit makes it impossible to use those programs whose outputs
may be non-UTF-8
>  (international encoding, or maybe even binary data). Streaming can overcome this limit
by introducing a simple
> encoding protocol. For example, it can allow the mapper/reducer to hexencode its keys/values,

> the framework decodes them in the Java side.
> This way, as long as the mapper/reducer executables follow this encoding protocol, 
> they can output arabitary bytearray and the streaming framework can handle them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message