hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur" <tuc...@gmail.com>
Subject Re: Multithreaded reduce
Date Tue, 09 Sep 2008 10:29:03 GMT
AFAIK there is not multithreaded reducer runner.

You have to make sure that you create each output collector only once,
not having a race condition in the creation.


On Tue, Sep 9, 2008 at 3:23 PM, Goel, Ankur <ankur.goel@corp.aol.com> wrote:
> Folks,
>      My implementation is a bit different. I am not using multithreaded
> reduce runner. Instead using thread-pools to do DB and HDFS I/O from
> each
> of my reduce tasks. To give you example from my setup, I have 3 reduce
> tasks each with a DB thread pool of size 70 threads. This is to ensure
> that I have a maximum of 200 threads hitting the DB doing inserts into
> multiple tables.
> Setup MySQL with large configuration and this really makes the inserts
> go at breakneck speeds.
> Now each of the threads returns a result that I want to collect on HDFS
> so I tried collecting the result via outputCollector from these threads
> which gave me the same exception. I also tried synchronizing the
> ouputCollector which did not help.
> So then I decided to use a separate thread pool in each reduce task for
> doing output collection via outputCollector. When this pool was set to
> have only 1 thread, the exception did not occur. Setting it to 5 threads
> or more caused the exception to show up.
> I'll post the stack trace after reproducing the problem.
> Thanks
> -Ankur
> -----Original Message-----
> From: Alejandro Abdelnur [mailto:tucu00@gmail.com]
> Sent: Tuesday, September 09, 2008 9:15 AM
> To: core-dev@hadoop.apache.org
> Subject: Re: Multithreaded reduce
> Collectors are already properly synchronized. Maybe there is a race
> condition in the way the multithreaded reducer runner creates them.
> A
> On Tue, Sep 9, 2008 at 8:56 AM, Owen O'Malley <omalley@apache.org>
> wrote:
>> On Sep 8, 2008, at 4:12 AM, Goel, Ankur wrote:
>>> They seem to not work fine when used in Reduce phase.
>>> I can post the stack trace if required.
>> I believe it. I don't think I've ever seen anyone do a multi-threaded
>> reduce. Of course the answer is easy, just add synchronization around
> the
>> output collector before calling collect.
>> -- Owen

View raw message