hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Pollak <...@athena.com>
Subject some newby questions
Date Wed, 08 Nov 2006 00:03:23 GMT

Is there a way to store "by-product" data someplace where it can be  
read?  For example, as I'm iterating over a collection of documents,  
I want to generate some statistics about the collection, put those  
stats "someplace" that can be accessed during future map-reduce  
cycles.  Should I simply run a "faux" map-reduce cycle to count the  
information and store it in a known location in the DFS?

Is there a way to map a collection of words or documents to  
associated numbers so that indexing could be based on the word number  
and/or document number rather than actual word and actual URL?   
Because the reduce tasks take place in separate processes, it seems  
that there's no way to coordinate the ordinal counting.

There's a MapFile construct that looks like it could be very useful  
for my application, but there's no documentation for MapFile.  Does  
anybody have pointers to documentation or example code?



View raw message