hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: coding question: user's global variables
Date Sat, 13 Oct 2007 17:29:18 GMT


The easy way is to put this initialization into the construction of the map
or reduce object.  Each map would have a private copy separate from every
other private copy, but since maps get called many, many times this
construction cost is, on average, small.


On 10/13/07 12:32 AM, "James Yu" <jamesyu.dev@gmail.com> wrote:

> Ted,
> 
> Thanks for your explanation.
> Actually I ran into a coding situation where my map function (or all map
> functions in distributed machines) to use (read only in my case) an
> ArrayList which I populate according to the content of a file at the
> launching of the whole program.  I needed to make sure all map functions
> (and even reduce functions) can see the same copy of that ArrayList.
> What is the proper way to do this?
> 
> --James
> 
> On 10/12/07, Ted Dunning <tdunning@veoh.com> wrote:
>> 
>> 
>> 
>> If you can do with read only constants, then you can define static finals
>> somewhere or other.  They won't really be global, but since you never
>> change
>> them, that won't matter.
>> 
>> If you just want global status indicators, then look at what the reporter
>> provides.
>> 
>> If you really want read/write global variables, then you have a real
>> problem.  In fact, that is the shared memory emulation problem all over
>> again and that is what map-reduce is intended to side step.  Such programs
>> can often be re-written so that you have an extra map reduce step or you
>> have additional input that gets sorted out to the mapper or reducer that
>> needs the values.
>> 
>> If you really, really can't restate your program in this fashion, then you
>> probably don't have a problem that is suitable for map-reduce.  You might
>> be
>> able to make use of something like hbase to give you database like
>> operations, but you may just have different kind of problem.  You might be
>> surprised at what a wide variety of problems are amenable to map-reduce
>> formulation.
>> 
>> What is it that makes you want these global variables?
>> 
>> 
>> On 10/12/07 5:09 PM, "James Yu" <jamesyu.dev@gmail.com> wrote:
>> 
>>> What is the best practice if I DO need to have some global variables
>>> accessible to ALL mappers and ALL reducers which are distributed?  Is
>> there
>>> recommendations?
>>> 
>>> -- James
>>> 
>>> On 10/12/07, Owen O'Malley <oom@yahoo-inc.com> wrote:
>>>> 
>>>> On Oct 11, 2007, at 9:54 PM, James Yu wrote:
>>>> 
>>>>> I put all user global variables in a class I called MyGlobals.
>>>> 
>>>> Since map/reduce is distributed in general, you should be careful of
>>>> using global variables. I find it to be better practice to keep all
>>>> of the state variables in either the Mapper or Reducer itself to
>>>> remind myself that it is _not_ shared between Mappers, Reducers, and
>>>> the launching program.
>>>> 
>>>> -- Owen
>>>> 
>> 
>> 


Mime
View raw message