Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 30054 invoked from network); 25 May 2006 16:01:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 25 May 2006 16:01:24 -0000 Received: (qmail 87800 invoked by uid 500); 25 May 2006 16:01:23 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 87778 invoked by uid 500); 25 May 2006 16:01:23 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 87769 invoked by uid 99); 25 May 2006 16:01:23 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 May 2006 09:01:23 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [74.0.0.77] (HELO linuxfly.dragonflymc.com) (74.0.0.77) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 May 2006 09:01:22 -0700 Received: from [192.168.1.246] (unknown [63.133.162.98]) by linuxfly.dragonflymc.com (Postfix) with ESMTP id BDA37C2006B for ; Thu, 25 May 2006 11:01:52 -0500 (CDT) Message-ID: <4475D4BD.7010700@dragonflymc.com> Date: Thu, 25 May 2006 11:01:01 -0500 From: Dennis Kubes User-Agent: Thunderbird 1.5.0.2 (Windows/20060308) MIME-Version: 1.0 To: hadoop-user@lucene.apache.org Subject: Re: Help with MapReduce References: <4475CDC4.4070703@dragonflymc.com> <4475D201.1030006@apache.org> In-Reply-To: <4475D201.1030006@apache.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N The problem is that I have a single url. I get the inlinks to that url and then I need to go access content from all of its inlink urls that have been fetched. I was doing this through Random access. But then I went back and re-read the google MapReduce paper and saw that it was designed for Sequential access and saw that Hadoop implements the same way. But so far I haven't found a way to efficiently solve this kind of problem in sequential format. If I were to do it in the configure and close wouldn't that still open a single reader per map call? Dennis Doug Cutting wrote: > Dennis Kubes wrote: >> I am trying to read a MapFile inside mapper and reducer >> implementations. So far the only way I have found to do it is by >> opening a new reader for each map and reduce call. Is anybody doing >> something similar and if so is there a way to open a single reader >> and reuse it across multiple map or reduce calls? > > Can't you open it in the configure() implementation? And close it in > the close() implementation? > > Are you randomly accessing a MapFile from a map() implementation? > That's not going to scale very well. MapReduce is designed for > sequential access. > > Doug