Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 29437 invoked from network); 20 Apr 2006 17:05:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 20 Apr 2006 17:05:10 -0000 Received: (qmail 26535 invoked by uid 500); 20 Apr 2006 17:05:10 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 26510 invoked by uid 500); 20 Apr 2006 17:05:09 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 26501 invoked by uid 99); 20 Apr 2006 17:05:09 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Apr 2006 10:05:09 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Apr 2006 10:05:08 -0700 Received: from brutus (localhost.localdomain [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DDAAA7142E0 for ; Thu, 20 Apr 2006 17:04:06 +0000 (GMT) Message-ID: <8418976.1145552646904.JavaMail.jira@brutus> Date: Thu, 20 Apr 2006 17:04:06 +0000 (GMT+00:00) From: "Teppo Kurki (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output. In-Reply-To: <393765989.1143819100222.JavaMail.jira@ajax> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-115?page=all ] Teppo Kurki updated HADOOP-115: ------------------------------- Attachment: hadoop-115_ReduceTask.patch Patch including TestReduceTask - generates a bunch of SequenceFiles and reduces them by running a single ReduceTask - two test methods, one where input is just copied to output and one where the Reducer swaps keys and values - Reducer checks that all generated key-value pairs are reduced by key - checks that the resulting output file contains what it's supposed to JobConf - the necessary set/getMapOutputKey/ValueClass methods - getOutputComparator uses MapKeyClass if one is specified ReduceTask - append and sort phases get the classes from getMapOutput.. methods This should take care of the Reduce part of the problem. MapTask should be also adjusted accordingly, but since I haven'twritten a test for that I haven't done it yet. Owen, I didn't get your comment on handling the combiners - doesn't the combiner just use the map OutputCollector underneath and as you put it map: k1,v1 -> seq(k2,v2) combine: k2,seq(v2) -> seq(k2,v2) the outputs are exactly the same, even if the combiner is technically a Reducer? > Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output. > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-115 > URL: http://issues.apache.org/jira/browse/HADOOP-115 > Project: Hadoop > Type: Improvement > Components: mapred > Reporter: Runping Qi > Attachments: hadoop-115_ReduceTask.patch, hadoop-115_tk.patch > > When map tasks write intermediate data out, they always use SequencialFile RecordWriter with key/value classes from the job object. > When the reducers write the final results out, its output format is obtained from the job object. By default, it is TextOutputFormat, and no conflicts. > However, if one wants to use SequencialFileFormat for the final results, then the key/value classes are also obtained from the job object, the same as the map tasks' output. Now we have a problem. It is impossible for the map outputs and reducer outputs use different key/value classes, if one wants the reducers generate outputs in SequentialFileFormat. > A simple fix would be to add another two attributes to JobConf class: mapOutputLeyClass and mapOutputValueClass. That allows the user to have different key/value classes for the intermediate and final outputs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira