hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miles Osborne" <mi...@inf.ed.ac.uk>
Subject Re: Serving contents of large MapFiles/SequenceFiles from memory across many machines
Date Fri, 19 Sep 2008 16:46:52 GMT
the problem here is that you don't want each mapper/reducer to have a
copy of the data.  you want that data --which can be very large--
stored in a distributed manner over your cluster and allow random
access to it during computation.

(this is what HBase etc do)


2008/9/19 Stuart Sierra <mail@stuartsierra.com>:
> On Thu, Sep 18, 2008 at 1:05 AM, Chris Dyer <redpony@umd.edu> wrote:
>> Basically, I'd like to be able to
>> load the entire contents of a file key-value map file in DFS into
>> memory across many machines in my cluster so that I can access any of
>> it with ultra-low latencies.
> I think the simplest way, which I've used, is to put your key-value
> file into DistributedCache, then load it into a HashMap or ArrayList
> in the configure method of each Map/Reduce task.
> -Stuart

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

View raw message