hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: "Lookup" HashMap available within the Map
Date Tue, 25 Nov 2008 20:42:56 GMT
cool. If you need a hand with Cascading stuff, feel free to ping me on  
the mail list or #cascading irc. lots of other friendly folk there  
already.

ckw

On Nov 25, 2008, at 12:35 PM, tim robertson wrote:

> Thanks Chris,
>
> I have a different test running, then will implement that.  Might give
> cascading a shot for what I am doing.
>
> Cheers
>
> Tim
>
>
> On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <chris@wensel.net>  
> wrote:
>> Hey Tim
>>
>> The .configure() method is what you are looking for i believe.
>>
>> It is called once per task, which in the default case, is once per  
>> jvm.
>>
>> Note Jobs are broken into parallel tasks, each task handles a  
>> portion of the
>> input data. So you may create your map 100 times, because there are  
>> 100
>> tasks, it will only be created once per jvm.
>>
>> I hope this makes sense.
>>
>> chris
>>
>> On Nov 25, 2008, at 11:46 AM, tim robertson wrote:
>>
>>> Hi Doug,
>>>
>>> Thanks - it is not so much I want to run in a single JVM - I do  
>>> want a
>>> bunch of machines doing the work, it is just I want them all to have
>>> this in-memory lookup index, that is configured once per job.  Is
>>> there some hook somewhere that I can trigger a read from the
>>> distributed cache, or is a Mapper.configure() the best place for  
>>> this?
>>> Can it be called multiple times per Job meaning I need to keep some
>>> static synchronised indicator flag?
>>>
>>> Thanks again,
>>>
>>> Tim
>>>
>>>
>>> On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <cutting@apache.org>  
>>> wrote:
>>>>
>>>> tim robertson wrote:
>>>>>
>>>>> Thanks Alex - this will allow me to share the shapefile, but I  
>>>>> need to
>>>>> "one time only per job per jvm" read it, parse it and store the
>>>>> objects in the index.
>>>>> Is the Mapper.configure() the best place to do this?  E.g. will it
>>>>> only be called once per job?
>>>>
>>>> In 0.19, with HADOOP-249, all tasks from a job can be run in a  
>>>> single
>>>> JVM.
>>>> So, yes, you could access a static cache from Mapper.configure().
>>>>
>>>> Doug
>>>>
>>>>
>>
>> --
>> Chris K Wensel
>> chris@wensel.net
>> http://chris.wensel.net/
>> http://www.cascading.org/
>>
>>


Mime
View raw message