hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: "Lookup" HashMap available within the Map
Date Tue, 25 Nov 2008 20:24:52 GMT
Hey Tim

The .configure() method is what you are looking for i believe.

It is called once per task, which in the default case, is once per jvm.

Note Jobs are broken into parallel tasks, each task handles a portion  
of the input data. So you may create your map 100 times, because there  
are 100 tasks, it will only be created once per jvm.

I hope this makes sense.


On Nov 25, 2008, at 11:46 AM, tim robertson wrote:

> Hi Doug,
> Thanks - it is not so much I want to run in a single JVM - I do want a
> bunch of machines doing the work, it is just I want them all to have
> this in-memory lookup index, that is configured once per job.  Is
> there some hook somewhere that I can trigger a read from the
> distributed cache, or is a Mapper.configure() the best place for this?
> Can it be called multiple times per Job meaning I need to keep some
> static synchronised indicator flag?
> Thanks again,
> Tim
> On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <cutting@apache.org>  
> wrote:
>> tim robertson wrote:
>>> Thanks Alex - this will allow me to share the shapefile, but I  
>>> need to
>>> "one time only per job per jvm" read it, parse it and store the
>>> objects in the index.
>>> Is the Mapper.configure() the best place to do this?  E.g. will it
>>> only be called once per job?
>> In 0.19, with HADOOP-249, all tasks from a job can be run in a  
>> single JVM.
>> So, yes, you could access a static cache from Mapper.configure().
>> Doug

Chris K Wensel

View raw message