hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Global object for a map task
Date Sun, 03 May 2009 06:53:48 GMT
If it is relatively small you can pass it via the JobConf object, storing a
serialized version of your dataset.
If it is larger you can pass a serialized version via the distributed cache.
Your map task will need to deserialize the object in the configure method.

None of the above methods give you an object that is write shared between
map tasks.

Please remember that the map tasks execute in separate JVM's on distinct
machines in the normal MapReduce environment.

On Sat, May 2, 2009 at 10:59 PM, Amandeep Khurana <amansk@gmail.com> wrote:

> How can I create a global variable for each node running my map task. For
> example, a common ArrayList that my map function can access for every k,v
> pair it works on. It doesnt really need to create the ArrayList everytime.
> If I create it in the main function of the job, the map function gets a
> null
> pointer exception. Where else can this be created?
> Amandeep
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz

Alpha Chapters of my book on Hadoop are available

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message