Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 72863 invoked from network); 3 Apr 2006 16:15:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 3 Apr 2006 16:15:19 -0000 Received: (qmail 68534 invoked by uid 500); 3 Apr 2006 16:15:18 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 68510 invoked by uid 500); 3 Apr 2006 16:15:18 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 68501 invoked by uid 99); 3 Apr 2006 16:15:18 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Apr 2006 09:15:18 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Apr 2006 09:15:17 -0700 Received: from ajax (localhost.localdomain [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 94128D4A01 for ; Mon, 3 Apr 2006 17:14:56 +0100 (BST) Message-ID: <517786485.1144080896603.JavaMail.jira@ajax> Date: Mon, 3 Apr 2006 17:14:56 +0100 (BST) From: "Owen O'Malley (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-115) Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output. In-Reply-To: <393765989.1143819100222.JavaMail.jira@ajax> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-115?page=comments#action_12372967 ] Owen O'Malley commented on HADOOP-115: -------------------------------------- I should have addressed the combiner before. *smile* Of course the combiner input and output has to match the map output types. So, it looks like: map: k1,v1 -> seq(k2,v2) combine: k2,seq(v2) -> seq(k2,v2) reduce: k2, seq(v2) -> seq(k3,v3) So the only extra code is to set/get the types for k2/v2 (or equivalent k3/v3), although I would recommend adding a type check in the reduce collector. It is completely upward compatible. As for user confusion, I've already had to explain this restriction (k2==k3 and v2==v3) far more times than I'd like. On a side note, we could hack around the problem by defining an OutputFormat that uses SequenceFileWriter, but doesn't open the file until the first key/value pair is written and takes the types from the first instances. But that breaks when someone puts the type check into the reduce collector. > Hadoop should allow the user to use SequentialFileOutputformat as the output format and to choose key/value classes that are different from those for map output. > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-115 > URL: http://issues.apache.org/jira/browse/HADOOP-115 > Project: Hadoop > Type: Improvement > Components: mapred > Reporter: Runping Qi > > When map tasks write intermediate data out, they always use SequencialFile RecordWriter with key/value classes from the job object. > When the reducers write the final results out, its output format is obtained from the job object. By default, it is TextOutputFormat, and no conflicts. > However, if one wants to use SequencialFileFormat for the final results, then the key/value classes are also obtained from the job object, the same as the map tasks' output. Now we have a problem. It is impossible for the map outputs and reducer outputs use different key/value classes, if one wants the reducers generate outputs in SequentialFileFormat. > A simple fix would be to add another two attributes to JobConf class: mapOutputLeyClass and mapOutputValueClass. That allows the user to have different key/value classes for the intermediate and final outputs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira