hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <nutch-...@dragonflymc.com>
Subject Re: Help with MapReduce
Date Thu, 25 May 2006 16:01:01 GMT
The problem is that I have a single url.  I get the inlinks to that url 
and then I need to go access content from all of its inlink urls that 
have been fetched. 

I was doing this through Random access.  But then I went back and 
re-read the google MapReduce paper and saw that it was designed for 
Sequential access and saw that Hadoop implements the same way.  But so 
far I haven't found a way to efficiently solve this kind of problem in 
sequential format.

If I were to do it in the configure and close wouldn't that still open a 
single reader per map call?

Dennis

Doug Cutting wrote:
> Dennis Kubes wrote:
>> I am trying to read a MapFile inside mapper and reducer 
>> implementations.  So far the only way I have found to do it is by 
>> opening a new reader for each map and reduce call.  Is anybody doing 
>> something similar and if so is there a way to open a single reader 
>> and reuse it across multiple map or reduce calls?
>
> Can't you open it in the configure() implementation?  And close it in 
> the close() implementation?
>
> Are you randomly accessing a MapFile from a map() implementation? 
> That's not going to scale very well.  MapReduce is designed for 
> sequential access.
>
> Doug

Mime
View raw message