Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 81636 invoked from network); 28 Jun 2007 18:59:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Jun 2007 18:59:26 -0000 Received: (qmail 14631 invoked by uid 500); 28 Jun 2007 18:59:29 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 14339 invoked by uid 500); 28 Jun 2007 18:59:28 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 14325 invoked by uid 99); 28 Jun 2007 18:59:28 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2007 11:59:28 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2007 11:59:24 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B0F40714062 for ; Thu, 28 Jun 2007 11:59:04 -0700 (PDT) Message-ID: <6833315.1183057144721.JavaMail.jira@brutus> Date: Thu, 28 Jun 2007 11:59:04 -0700 (PDT) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-475) The value iterator to reduce function should be clonable In-Reply-To: <28640755.1156462739379.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508905 ] Doug Cutting commented on HADOOP-475: ------------------------------------- I think the original intention was to just make the iterator cloneable, so that a reducer could iterate through the values more than once. It might first, e.g., count the total number of values, then, having this count, filter them or somesuch. This should work even when the values would not all fit in memory. Whether this feature is still interesting to folks I cannot say... > The value iterator to reduce function should be clonable > -------------------------------------------------------- > > Key: HADOOP-475 > URL: https://issues.apache.org/jira/browse/HADOOP-475 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Runping Qi > Assignee: Owen O'Malley > > In the current framework, when the user implements the reduce method of Reducer class, > the user can only iterate through the value iterator once. > This makes it hard for the user to perform join-like operations with in the reduce method. > To address problem, one approach is to make the input value iterator clonable. Then the user can iterate the values in different ways. > If the iterator can be reset, then the user can perform nested iterations over the data, thus > carry out join-likeoperations. > The user code in reduce method would be something like: > iterator1 = values.clone(); > iterator2 = values.clone(); > while (iterator1.hasNext()) { > val1 = iterator1.next(); > iterator2.reset(); > while (iterator2.hasNext()) { > val2 = iterator.next(); > do something vased on val1 and val2 > ....................... > } > } > One possible optimization is that if the values are sorted based on a secondary key, > the reset function can take a secondary key as an argument and reset the iterator to the begining > position of the secondary key. It will be very helpful if there is a utility that returns a list of iterators, > one per secondary key value, from the given iterator: > TreeMap getIteratorsBasedOnSecondaryKey(iterator); > Each entry in the returned map object is a pair of . > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.