Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DDE6610323 for ; Wed, 9 Oct 2013 11:14:28 +0000 (UTC) Received: (qmail 87062 invoked by uid 500); 9 Oct 2013 11:14:22 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 86749 invoked by uid 500); 9 Oct 2013 11:14:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 86737 invoked by uid 99); 9 Oct 2013 11:14:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Oct 2013 11:14:21 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [134.130.5.186] (HELO mx-out-1.rwth-aachen.de) (134.130.5.186) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Oct 2013 11:14:14 +0000 X-IronPort-AV: E=Sophos;i="4.90,1062,1371074400"; d="scan'208";a="240872535" Received: from relay-auth-2.ms.rz.rwth-aachen.de (HELO relay-auth-2) ([134.130.7.79]) by mx-1.rz.rwth-aachen.de with ESMTP; 09 Oct 2013 13:13:52 +0200 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=us-ascii Received: from [192.168.1.6] ([unknown] [78.35.115.182]) by relay-auth-2.ms.rz.rwth-aachen.de (Sun Java(tm) System Messaging Server 7.0-3.01 64bit (built Dec 9 2008)) with ESMTPA id <0MUE00BIIF74V290@relay-auth-2.ms.rz.rwth-aachen.de> for user@hadoop.apache.org; Wed, 09 Oct 2013 13:13:52 +0200 (CEST) From: Youssef Hatem Subject: Problem with streaming exact binary chunks Date: Wed, 09 Oct 2013 13:13:53 +0200 Message-id: <9C5671DA-FB7C-441B-B210-34FF8A614DFF@rwth-aachen.de> To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1283) X-Virus-Checked: Checked by ClamAV on apache.org Hello, I wrote a very simple InputFormat and RecordReader to send binary data to mappers. Binary data can contain anything (including \n, \t, \r), here is what next() may actually send: public class MyRecordReader implements RecordReader { ... public boolean next(BytesWritable key, BytesWritable ignore) throws IOException { ... byte[] result = new byte[8]; for (int i = 0; i < result.length; ++i) result[i] = (byte)(i+1); result[3] = (byte)'\n'; result[4] = (byte)'\n'; key.set(result, 0, result.length); return true; } } As you can see I am using BytesWritable to send eight bytes: 01 02 03 0a 0a 06 07 08, I also use Hadoop-1722 typed bytes (by setting -D stream.map.input=typedbytes). According to the documentation of typed bytes the mapper should receive the following byte sequence: 00 00 00 08 01 02 03 0a 0a 06 07 08 However bytes are somehow modified and I get the following sequence instead: 00 00 00 08 01 02 03 09 0a 09 0a 06 07 08 0a = '\n' 09 = '\t' It seems that Hadoop (streaming?) parsed the new line character as a separator and put '\t' which is the key/value separator for streaming I assume. Is there any work around to send *exactly* the same bytes sequence no matter what characters are in the sequence? Thanks in advance. Best regards, Youssef Hatem