Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 27324 invoked from network); 1 Apr 2006 15:37:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Apr 2006 15:37:49 -0000 Received: (qmail 33419 invoked by uid 500); 1 Apr 2006 15:37:49 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 33398 invoked by uid 500); 1 Apr 2006 15:37:49 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 33389 invoked by uid 99); 1 Apr 2006 15:37:49 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Apr 2006 07:37:49 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Apr 2006 07:37:48 -0800 Received: from ajax (localhost.localdomain [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 7EC7A6ACB2 for ; Sat, 1 Apr 2006 16:37:27 +0100 (BST) Message-ID: <2006631731.1143905847516.JavaMail.jira@ajax> Date: Sat, 1 Apr 2006 16:37:27 +0100 (BST) From: "eric baldeschwieler (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output. In-Reply-To: <393765989.1143819100222.JavaMail.jira@ajax> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-115?page=comments#action_12372783 ] eric baldeschwieler commented on HADOOP-115: -------------------------------------------- Ah! Are you suggesting that getOutput* describes the final classes output from reduce always and if you don't set the new variables MapOutput* it also controls the map? That is clear enough and backwards compatible. I just was not looking at it the right way! > Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output. > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-115 > URL: http://issues.apache.org/jira/browse/HADOOP-115 > Project: Hadoop > Type: Improvement > Components: mapred > Reporter: Runping Qi > > When map tasks write intermediate data out, they always use SequencialFile RecordWriter with key/value classes from the job object. > When the reducers write the final results out, its output format is obtained from the job object. By default, it is TextOutputFormat, and no conflicts. > However, if one wants to use SequencialFileFormat for the final results, then the key/value classes are also obtained from the job object, the same as the map tasks' output. Now we have a problem. It is impossible for the map outputs and reducer outputs use different key/value classes, if one wants the reducers generate outputs in SequentialFileFormat. > A simple fix would be to add another two attributes to JobConf class: mapOutputLeyClass and mapOutputValueClass. That allows the user to have different key/value classes for the intermediate and final outputs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira