Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C7B3A10088 for ; Thu, 10 Oct 2013 12:25:08 +0000 (UTC) Received: (qmail 59140 invoked by uid 500); 10 Oct 2013 12:24:58 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 58953 invoked by uid 500); 10 Oct 2013 12:24:55 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 58921 invoked by uid 99); 10 Oct 2013 12:24:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 12:24:54 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [134.130.5.187] (HELO mx-out-2.rwth-aachen.de) (134.130.5.187) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 12:24:48 +0000 X-IronPort-AV: E=Sophos;i="4.90,1071,1371074400"; d="scan'208";a="153013456" Received: from relay-auth-1.ms.rz.rwth-aachen.de (HELO relay-auth-1) ([134.130.7.78]) by mx-2.rz.rwth-aachen.de with ESMTP; 10 Oct 2013 14:24:25 +0200 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=us-ascii Received: from [192.168.1.6] ([unknown] [78.35.100.195]) by relay-auth-1.ms.rz.rwth-aachen.de (Sun Java(tm) System Messaging Server 7.0-3.01 64bit (built Dec 9 2008)) with ESMTPA id <0MUG00JVED4PXF10@relay-auth-1.ms.rz.rwth-aachen.de> for user@hadoop.apache.org; Thu, 10 Oct 2013 14:24:25 +0200 (CEST) Subject: Re: Problem with streaming exact binary chunks From: Youssef Hatem In-reply-to: Date: Thu, 10 Oct 2013 14:24:24 +0200 Message-id: <66BA1E6B-4F98-4124-AD89-000620BC6B57@rwth-aachen.de> References: <9C5671DA-FB7C-441B-B210-34FF8A614DFF@rwth-aachen.de> To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1283) X-Virus-Checked: Checked by ClamAV on apache.org Hello, Thanks a lot for the information. It helped me figure out the solution of this problem. I posted the sketch of solution on StackOverflow (http://stackoverflow.com/a/19295610/337194) for anybody who is interested. Best regards, Youssef Hatem On Oct 9, 2013, at 14:08 , Peter Marron wrote: > Hi, > > The only way that I could find was to override the various InputWriter and OutputWriter classes. > as defined by the configuration settings > stream.map.input.writer.class > stream.map.output.reader.class > stream.reduce.input.writer.class > stream.reduce. output.reader.class > which was painful. Hopefully someone will tell you the _correct_ way to do this. > If not I will provide more details. > > Regards, > > Peter Marron > Trillium Software UK Limited > > Tel : +44 (0) 118 940 7609 > Fax : +44 (0) 118 940 7699 > E: Peter.Marron@TrilliumSoftware.com > > -----Original Message----- > From: Youssef Hatem [mailto:youssef.hatem@rwth-aachen.de] > Sent: 09 October 2013 12:14 > To: user@hadoop.apache.org > Subject: Problem with streaming exact binary chunks > > Hello, > > I wrote a very simple InputFormat and RecordReader to send binary data to mappers. Binary data can contain anything (including \n, \t, \r), here is what next() may actually send: > > public class MyRecordReader implements > RecordReader { > ... > public boolean next(BytesWritable key, BytesWritable ignore) > throws IOException { > ... > > byte[] result = new byte[8]; > for (int i = 0; i < result.length; ++i) > result[i] = (byte)(i+1); > result[3] = (byte)'\n'; > result[4] = (byte)'\n'; > > key.set(result, 0, result.length); > return true; > } > } > > As you can see I am using BytesWritable to send eight bytes: 01 02 03 0a 0a 06 07 08, I also use Hadoop-1722 typed bytes (by setting -D stream.map.input=typedbytes). > > According to the documentation of typed bytes the mapper should receive the following byte sequence: > 00 00 00 08 01 02 03 0a 0a 06 07 08 > > However bytes are somehow modified and I get the following sequence instead: > 00 00 00 08 01 02 03 09 0a 09 0a 06 07 08 > > 0a = '\n' > 09 = '\t' > > It seems that Hadoop (streaming?) parsed the new line character as a separator and put '\t' which is the key/value separator for streaming I assume. > > Is there any work around to send *exactly* the same bytes sequence no matter what characters are in the sequence? Thanks in advance. > > Best regards, > Youssef Hatem