Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 79818 invoked from network); 25 Jul 2008 16:23:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Jul 2008 16:23:59 -0000 Received: (qmail 69124 invoked by uid 500); 25 Jul 2008 16:23:52 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 69085 invoked by uid 500); 25 Jul 2008 16:23:52 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 69074 invoked by uid 99); 25 Jul 2008 16:23:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jul 2008 09:23:52 -0700 X-ASF-Spam-Status: No, hits=-1999.6 required=10.0 tests=ALL_TRUSTED,SUBJECT_FUZZY_TION X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Jul 2008 16:23:06 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 16125234C18A for ; Fri, 25 Jul 2008 09:23:32 -0700 (PDT) Message-ID: <1865032326.1217003012089.JavaMail.jira@brutus> Date: Fri, 25 Jul 2008 09:23:32 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616935#action_12616935 ] Owen O'Malley commented on HADOOP-1230: --------------------------------------- Doug proposed checking the code in as we work on this patch, because it isn't called by the rest of the code and will be far easier to review. So the new api is in src/mapred/org/apache/hadoop/mapreduce. Notable changes since the last patch: * The Mapper and Reducer have a new format that combines the MapRunnable with Mapper and introduces a similar style for Reducer. Their templating is now much easier to understand and use. * The Mapper and Reducer base classes are now the identity functions. * I've split the context object into a tree where the lower ones inherit from the one above. * JobContext - information about the job * TaskAttemptContxt - information about the task * TaskInputOutputContext - add input and output methods for the task * MapperContext and ReducerContext provide the specific methods for each * I added Job, which is how the user sets up, submits, waits for jobs, and gets status. Job also allows kiling the job or tasks. * I split the lib directory into parts for in, map, reduce, parition, out to give a little hierarchy. * I filled in {Text,SequenceFile}{In,Out}putFormat to make sure that I had the interfaces right. * I changed the input methods to match the serialization factory interfaces. * JobConf goes away to replaced by Configuration. The getter methods in JobConf mostly go to JobContext. The setter methods mostly go to Job. * A word count example is included. That would clearly be moved to the example source tree when we are doing the final commit. * I removed the number of mappers and replaced it with a max split size. The old model was very confusing to explain. * I used all new attribute names so that we don't have collisions with the old attributes. * In the Mapper, the Mapper owns the input key and value, which made the multi-threaded mapper easier to do. I need a similar scheme in the ReduceContext.getValues. Missing: * I need an interface to query jobs, that were submitted by another process. Probably a JobTracker class is the best bet that provides query options and returns Jobs. * I didn't move TaskCompletionEvents yet. > Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes > -------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-1230 > URL: https://issues.apache.org/jira/browse/HADOOP-1230 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: context-objs-2.patch, context-objs-3.patch, context-objs.patch > > > This is a big change, but it will future-proof our API's. To maintain backwards compatibility, I'd suggest that we move over to a new package name (org.apache.hadoop.mapreduce) and deprecate the old interfaces and package. Basically, it will replace: > package org.apache.hadoop.mapred; > public interface Mapper extends JobConfigurable, Closeable { > void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException; > } > with: > package org.apache.hadoop.mapreduce; > public interface Mapper extends Closable { > void map(MapContext context) throws IOException; > } > where MapContext has the methods like getKey(), getValue(), collect(Key, Value), progress(), etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.