hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: how to create collections in the mapper class
Date Fri, 28 Dec 2007 18:12:47 GMT

This sounds like there is a little bit of confusion going on here.

It is common for people who are starting with Hadoop that they are surprised
when static fields of the mapper do not get shared across all parallel
instances of the map function.  This is, of course, because you are running
many mappers.

Usually when people say what you are saying, the reason is that they are
trying to do something like removing duplicate elements.  The best way to do
that is to NOT try to put state into the map function, but rather to use the
reduce and sorting functions to do the work.  A good example is trying to
find all of the unique words in a set of documents.  If you just use a
word-counting function, you get what you want (a list of unique words).  If
you want a list of unique words per day, then you simply have to change the
program so that the mapper outputs a key that contains the word and the day
and do the count as before.

Remember also that your program may contain several map/reduce steps.

Perhaps if you say more about what you are trying to do, it would be easier
to help you.


On 12/28/07 6:35 AM, "helena21" <ahelen19@gmail.com> wrote:

> 
> Hi Everybody,
> 
> i want to create arraylist that collects some objects from the input in the
> mapper class so that i want to use these collections to filter my input. the
> problem is my arraylist can't have even one object in it. its size is always
> zero. pls pls point me how can i create arraylist or other collection
> objects. i make it static object but still the arraylist can't collect any
> object.
> 
> Thanks
> Helen


Mime
View raw message