Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B3B689FB9 for ; Mon, 7 May 2012 15:35:12 +0000 (UTC) Received: (qmail 58886 invoked by uid 500); 7 May 2012 15:35:12 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 58819 invoked by uid 500); 7 May 2012 15:35:12 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 58809 invoked by uid 99); 7 May 2012 15:35:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 15:35:12 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 15:35:10 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6F807437108 for ; Mon, 7 May 2012 15:34:50 +0000 (UTC) Date: Mon, 7 May 2012 15:34:50 +0000 (UTC) From: "Harsh J (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1466723311.34704.1336404890472.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-2001) Enhancement to SequenceFileOutputFormat to allow user to set MetaData MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269698#comment-13269698 ] Harsh J commented on MAPREDUCE-2001: ------------------------------------ bq. Why would the configure method of the mapper care if the recordwriter/outputformat had been created yet? It doesn't care on its own, but advanced users may be relying on this 'behavior' to do stuff. For instance, I've once relied on this behavior to have my output format inject a few strings into jobconf upon RW instantiation (some logic dependent on input format initialization that goes even before this), such that I then get the set strings in my mapper's configure. True that I was probably doing something wrong, and what I did can be done in a better/alternate way, but I ended up relying on that behavior and thats what I'm talking about (in terms of breakage). bq. I would think we would want the recordwriter/outputformat to get configured after the configure method to allow tasks to make task level config changes to a recordwriter/outputformat True. I just don't know why its this way in the old API. Probably an oversight. bq. I am confused by this comment, do you agree with my approach or are you just disappointed that the behavior will be inconsistent between the old and new api for map only jobs? Sorry for the confusion. Its just the latter. I agree with your approach. > Enhancement to SequenceFileOutputFormat to allow user to set MetaData > --------------------------------------------------------------------- > > Key: MAPREDUCE-2001 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2001 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.20.2 > Reporter: David Rosenstrauch > Priority: Minor > Attachments: MAPREDUCE-2001.patch > > > The org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat class currently does not provide a way for the user to pass in a MetaData object to be written to the SequenceFile. > Currently he only way for a developer to implement this functionality appears to be to create a subclass which overrides the SequenceFileOutputFormat's getRecordWriter() method, which is a bit of a kludge. > This seems to be a common enough request to warrant a fix of some sort. (It's already been brought up twice in the past year: http://www.mail-archive.com/common-user@hadoop.apache.org/msg02198.html and http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg00904.html) > A couple of possible solutions: > 1) provide a static method SequenceFileOutputFormat.setMetaData(Job, MetaData) > 2) Provide a (non-static) setMetaData() method on the SequenceFileOutputFormat class. The user would create a subclass of SequenceFileOutputFormat which, say, implements Configurable. Then in the setConf() method, the user could create the MetaData object (using data from the Configuration), and then call setMetaData. The SequenceFileOutputFormat would then use this MetaData object when creating the SequenceFile. (Note that the user would have to create a subclass of SequenceFileOutputFormat to make this solution work.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira