Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 77727 invoked from network); 28 Sep 2006 20:32:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Sep 2006 20:32:38 -0000 Received: (qmail 62430 invoked by uid 500); 28 Sep 2006 20:32:08 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 62182 invoked by uid 500); 28 Sep 2006 20:32:07 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 61953 invoked by uid 99); 28 Sep 2006 20:32:01 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Sep 2006 13:31:58 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C10577142A2 for ; Thu, 28 Sep 2006 20:26:55 +0000 (GMT) Message-ID: <11725091.1159475215787.JavaMail.jira@brutus> Date: Thu, 28 Sep 2006 13:26:55 -0700 (PDT) From: "Sameer Paranjpye (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-439) Streaming does not work for text data if the records don't fit in a short UTF8 [2^16/3 characters] In-Reply-To: <8502499.1155174613900.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-439?page=all ] Sameer Paranjpye updated HADOOP-439: ------------------------------------ Component/s: contrib/streaming > Streaming does not work for text data if the records don't fit in a short UTF8 [2^16/3 characters] > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-439 > URL: http://issues.apache.org/jira/browse/HADOOP-439 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming > Affects Versions: 0.5.0 > Reporter: Dick King > Assigned To: Hairong Kuang > Priority: Critical > > The streaming code internally reads the input data into a UTF8 . This causes truncated data to be shipped to the mapper when the input exceeds about 21000 characters, with no notice to the user except possibly in individual tasks' machines' logs, which people would not normally read for apparently successful jobs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira