Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 57733 invoked from network); 4 Mar 2009 03:48:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Mar 2009 03:48:19 -0000 Received: (qmail 48647 invoked by uid 500); 4 Mar 2009 03:48:17 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 48614 invoked by uid 500); 4 Mar 2009 03:48:17 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 48603 invoked by uid 99); 4 Mar 2009 03:48:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2009 19:48:17 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Mar 2009 03:48:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4355E234C4B8 for ; Tue, 3 Mar 2009 19:47:56 -0800 (PST) Message-ID: <840360355.1236138476274.JavaMail.jira@brutus> Date: Tue, 3 Mar 2009 19:47:56 -0800 (PST) From: "Hong Tang (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5368) more user control on customized RecordReader In-Reply-To: <1497609857.1235832373705.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678567#action_12678567 ] Hong Tang commented on HADOOP-5368: ----------------------------------- Can your record reader put the file name as (part of) your input key? > more user control on customized RecordReader > -------------------------------------------- > > Key: HADOOP-5368 > URL: https://issues.apache.org/jira/browse/HADOOP-5368 > Project: Hadoop Core > Issue Type: Wish > Reporter: he yongqiang > > Currently user can define own InputFormat and RecordReader, but the user has little control on them. > An example, we input mutiple files into the mapper and want to handle them in different ways depending on which file this mapper is working. > This can be easily done as follows: > {code} > public class BlockMapRunner implements MapRunnable { > private BlockMapper mapper; > @Override > public void run(RecordReader input, OutputCollector output, > Reporter reporter) throws IOException { > if (mapper == null) > return; > BlockReader blkReader = (BlockReader) input; > this.mapper.initialize(input); > ........... > } > @Override > public void configure(JobConf job) { > JobConf work = new JobConf(job); > Class mapCls = work.getBlockMapperClass(); > if (mapCls != null) { > this.mapper = (BlockMapper) ReflectionUtils > .newInstance(mapCls, job); > } > } > } > /* > BlockMapper implements the Mapper and is initialized from RecordReader, from which we get which file this mapper is working on and find the right strategy for it. > */ > public class ExtendedMapper extends BlockMapper { > private Strategy strategy; > private Configuration work; > @Override > public void configure(Configuration job) { > this.work = job; > } > @Override > public void initialize(RecordReader reader) throws IOException { > String path = ((UserDefinedRecordReader) reader).which_File_We_Are_Working_On(); //((UserDefinedRecordReader) reader) is wrong! > this.strategy = this.work.getStrategy(path); > } > @Override > public void map(Key k, V value, OutputCollector output, Reporter reporter) > throws IOException { > strategy.handle(k,v); > } > } > {code} > {color:red} > However, the above code does not work. The reader passed into mapper is wrapped by MapTask, and is either SkippingRecordReader or TrackedRecordReader. We can not cast it back and we can not pass any information through the user defined RecordReader. If the SkippingRecordReader and TrackedRecordReader have a method for getting the raw reader, it will not have this problem. > {color:} > This problem could be resolved by initiating many map-reduce jobs,one job for each file. But this apparently is what we want. > Or there exist other solutions? > Appreciated for any comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.