hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur" <tuc...@gmail.com>
Subject Re: Multithreaded reduce
Date Mon, 08 Sep 2008 10:18:53 GMT
OutputCollectors work fine when multithreaded, look at the MultiThreadMapRunner.


On Mon, Sep 8, 2008 at 1:21 PM, Goel, Ankur <ankur.goel@corp.aol.com> wrote:
> Hi Folks,
>
>             I have a setup where I am using a thread-pool
> implementation (provided by java.util.concurrent package) in reduce
> phase to do database I/O to speed up the database upload. The DB here is
> MySQL. I decided to go for additional parallelism via threads as
>
> 1. It considerably speeds up the upload while consuming less cluster
> resources (i.e. less number of reducers required).
>
> 2. The upload speed is not limited by the reduce task capacity of the
> cluster but by the DB's ability to handle max connections simultaneously
> and effectively.
>
>
>
> Each reduce task has 2 thread pools. One that does the DB I/O and whose
> return a java.util.concurrent.FutureTask. Another pool that fetches
> result from this future task to do disc I/O i.e.
> outputCollector.collect(...).
>
>
>
> When multiple threads from the second pool try to do a disc I/O, I get
> an AlreadyBeingCreatedException in the logs. If I set the second pool to
> only have 1 thread then things work fine!
>
>
>
> It looks like the output collector was never assumed to be used from
> multiple threads.
>
>
>
> Any thoughts on this?
>
>
>
> Thanks
>
> -Ankur
>
>
>
>

Mime
View raw message