hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: MultithreadedMapRunner or MultithreadedMapper?
Date Mon, 17 May 2010 11:50:18 GMT
Hi Juber,

MultithreadedMapper uses new api that got introduced in branch 0.20, whereas MultithreadedMapRunner
uses old interface.
MultithreadedMapRunner is deprecated in branch 0.21 through https://issues.apache.org/jira/browse/MAPREDUCE-465.
If you are using branch 0.20, you can use any one of them. But do not use them together.
I would prefer to use MultthreadedMapper, because the other will be deprecated in subsequent
versions.

Thanks
Amareshwari

On 5/17/10 7:25 AM, "juber patel" <juberpatel@gmail.com> wrote:

Hello,


I am a bit confused between MultithreadedMapRunner and
MultithreadedMapper classes. Basically I have huge "side data" (4GB)
for the map part and I want it in memory. I don't want each mapper to
load its own copy of that data. So I decided to limit one mapper per
machine and and make it multithreaded so that all the cores are
utilized. The side data is read only and can be shared by all threads.

My question is: Which one of MultithreadedMapRunner and
MultithreadedMapper classes should I be using? Or they have to be used
together? (choose MultithreadedMapRunner in the config file and then
extend MultithreadedMapper for map tasks). I notice that one is in
mapred package and the other is in mapreduce package but neither is
deprecated. I can use the latest version of Hadoop since I am just
starting up.


thanks in advance,


Juber


Mime
View raw message