hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <ankur.g...@corp.aol.com>
Subject RE: Multithreaded reduce
Date Tue, 09 Sep 2008 09:53:52 GMT
      My implementation is a bit different. I am not using multithreaded
reduce runner. Instead using thread-pools to do DB and HDFS I/O from
of my reduce tasks. To give you example from my setup, I have 3 reduce
tasks each with a DB thread pool of size 70 threads. This is to ensure
that I have a maximum of 200 threads hitting the DB doing inserts into
multiple tables.

Setup MySQL with large configuration and this really makes the inserts
go at breakneck speeds.

Now each of the threads returns a result that I want to collect on HDFS
so I tried collecting the result via outputCollector from these threads
which gave me the same exception. I also tried synchronizing the
ouputCollector which did not help. 

So then I decided to use a separate thread pool in each reduce task for
doing output collection via outputCollector. When this pool was set to
have only 1 thread, the exception did not occur. Setting it to 5 threads
or more caused the exception to show up.

I'll post the stack trace after reproducing the problem.


-----Original Message-----
From: Alejandro Abdelnur [mailto:tucu00@gmail.com] 
Sent: Tuesday, September 09, 2008 9:15 AM
To: core-dev@hadoop.apache.org
Subject: Re: Multithreaded reduce

Collectors are already properly synchronized. Maybe there is a race
condition in the way the multithreaded reducer runner creates them.


On Tue, Sep 9, 2008 at 8:56 AM, Owen O'Malley <omalley@apache.org>
> On Sep 8, 2008, at 4:12 AM, Goel, Ankur wrote:
>> They seem to not work fine when used in Reduce phase.
>> I can post the stack trace if required.
> I believe it. I don't think I've ever seen anyone do a multi-threaded
> reduce. Of course the answer is easy, just add synchronization around
> output collector before calling collect.
> -- Owen

View raw message