hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: What's the best way to get to a single key?
Date Mon, 03 Mar 2008 22:52:09 GMT
Use MapFileOutputFormat to write your data, then call:

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapFileOutputFormat.html#getEntry(org.apache.hadoop.io.MapFile.Reader[],%20org.apache.hadoop.mapred.Partitioner,%20K,%20V)

The documentation is pretty sparse, but the intent is that you open a 
MapFile.Reader for each mapreduce output, pass the partitioner used, the 
key, and the value to be read into.

A MapFile maintains an index of keys, so the entire file need not be 
scanned.  If you really only need the value of a single key then you 
might avoid opening all of the output files.  In that case you could 
might use the Partitioner and the MapFile API directly.

Doug


Xavier Stevens wrote:
> I am curious how others might be solving this problem.  I want to
> retrieve a record from HDFS based on its key.  Are there any methods
> that can shortcut this type of search to avoid parsing all data until
> you find it?  Obviously Hbase would do this as well, but I wanted to
> know if there is a way to do it using just Map/Reduce and HDFS.
> 
> Thanks,
> 
> -Xavier
> 


Mime
View raw message