Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 39738 invoked from network); 6 Mar 2009 04:20:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Mar 2009 04:20:19 -0000 Received: (qmail 53179 invoked by uid 500); 6 Mar 2009 04:20:17 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 53143 invoked by uid 500); 6 Mar 2009 04:20:17 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 53131 invoked by uid 99); 6 Mar 2009 04:20:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2009 20:20:17 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2009 04:20:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 41611234C4B6 for ; Thu, 5 Mar 2009 20:19:56 -0800 (PST) Message-ID: <691129370.1236313196266.JavaMail.jira@brutus> Date: Thu, 5 Mar 2009 20:19:56 -0800 (PST) From: "zhuweimin (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-1722) Make streaming to handle non-utf8 byte array MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679487#action_12679487 ] zhuweimin commented on HADOOP-1722: ----------------------------------- The error occurred when using the -D option,the following is error message [hadoop@super03 hadoop-latest]$ hadoop jar contrib/streaming/hadoop-0.19.1-streaming.jar -input data -output result -mapper "wc -c" -numReduceTasks 0 -D stream.map.input=rawbytes 09/03/06 13:18:31 ERROR streaming.StreamJob: Unexpected -D while processing -input|-output|-mapper|-combiner|-reducer|-file|-dfs|-jt|-additionalconfspec|-inputformat|-outputformat|-partitioner|-numReduceTasks|-inputreader|-mapdebug|-reducedebug|||-cacheFile|-cacheArchive|-io|-verbose|-info|-debug|-inputtagged|-help Usage: $HADOOP_HOME/bin/hadoop jar \ $HADOOP_HOME/hadoop-streaming.jar [options] Options: -input DFS input file(s) for the Map step -output DFS output directory for the Reduce step -mapper The streaming command to run -combiner Combiner has to be a Java class -reducer The streaming command to run -file File/dir to be shipped in the Job jar file -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional. -outputformat TextOutputFormat(default)|JavaClassName Optional. -partitioner JavaClassName Optional. -numReduceTasks Optional. -inputreader Optional. -cmdenv = Optional. Pass env.var to streaming commands -mapdebug Optional. To run this script when a map task fails -reducedebug Optional. To run this script when a reduce task fails -io Optional. -verbose Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a job tracker -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] For more details about these options: Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info Streaming Command Failed! > Make streaming to handle non-utf8 byte array > -------------------------------------------- > > Key: HADOOP-1722 > URL: https://issues.apache.org/jira/browse/HADOOP-1722 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/streaming > Reporter: Runping Qi > Assignee: Klaas Bosteels > Fix For: 0.21.0 > > Attachments: HADOOP-1722-branch-0.18.patch, HADOOP-1722-branch-0.19.patch, HADOOP-1722-v2.patch, HADOOP-1722-v3.patch, HADOOP-1722-v4.patch, HADOOP-1722-v4.patch, HADOOP-1722-v5.patch, HADOOP-1722-v6.patch, HADOOP-1722.patch > > > Right now, the streaming framework expects the output sof the steam process (mapper or reducer) are line > oriented UTF-8 text. This limit makes it impossible to use those programs whose outputs may be non-UTF-8 > (international encoding, or maybe even binary data). Streaming can overcome this limit by introducing a simple > encoding protocol. For example, it can allow the mapper/reducer to hexencode its keys/values, > the framework decodes them in the Java side. > This way, as long as the mapper/reducer executables follow this encoding protocol, > they can output arabitary bytearray and the streaming framework can handle them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.