hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Phelps <...@opendns.com>
Subject Re: Large static structures in M/R heap
Date Wed, 27 Feb 2013 19:40:32 GMT
We actually use CDBs a good bit outside of M/R.  This is something worth
looking into, but the big structure we're currently using is a giant
tree-based lookup table whose access pattern is pretty random, so I
don't think caching would be of much use.  There is a lesser (but still
large) structure this might work for.

- Adam

On 2/27/13 10:56 AM, Robert Evans wrote:
> Have you looked at things like CDB http://cr.yp.to/cdb.html that would
> allow you to keep most of the file on disk and cache hot parts in memory.
> That really depends on your access pattern.
> Alternatively you could give yourself more heap and take up two slots for
> your map task.
> Also if it is big enough you might want to look at using a reduce to do
> the join instead of trying to do a map side join.
> --Bobby
> On 2/27/13 12:42 PM, "Adam Phelps" <amp@opendns.com> wrote:
>> We have a job that uses a large lookup structure that gets created as a
>> static class during the map setup phase (and we have the JVM reused so
>> this only takes place once).  However of late this structure has grown
>> drastically (due to items beyond our control) and we've seen a
>> substantial increase in map time due to the lower available memory.
>> Are there any easy solutions to this sort of problem?  My first thought
>> was to see if it was possible to have all tasks for a job execute in
>> parallel within the same JVM, but I'm not seeing any setting that would
>> allow that.  Beyond that my only ideas are to move that data into an
>> external one-per-node key-value store like memcached, but I'm worried
>> the additional overhead of sending a query for each value being mapped
>> would also kill the job performance.
>> - Adam

View raw message