hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lohit <lohit.vijayar...@yahoo.com>
Subject Re: Multithreaded reduce
Date Mon, 08 Sep 2008 15:08:18 GMT
I might be wrong but my guess is this. This exception might be from the underneath layer of
dfs. Output creates a file and in your case there might me multiple create requests. Can your
threads share output collector? 

Sent from my iPhone

On Sep 8, 2008, at 12:51 AM, "Goel, Ankur" <ankur.goel@corp.aol.com> wrote:

Hi Folks,

            I have a setup where I am using a thread-pool
implementation (provided by java.util.concurrent package) in reduce
phase to do database I/O to speed up the database upload. The DB here is
MySQL. I decided to go for additional parallelism via threads as 

1. It considerably speeds up the upload while consuming less cluster
resources (i.e. less number of reducers required). 

2. The upload speed is not limited by the reduce task capacity of the
cluster but by the DB's ability to handle max connections simultaneously
and effectively.



Each reduce task has 2 thread pools. One that does the DB I/O and whose
return a java.util.concurrent.FutureTask. Another pool that fetches
result from this future task to do disc I/O i.e.
outputCollector.collect(...).



When multiple threads from the second pool try to do a disc I/O, I get
an AlreadyBeingCreatedException in the logs. If I set the second pool to
only have 1 thread then things work fine!



It looks like the output collector was never assumed to be used from
multiple threads.



Any thoughts on this?



Thanks

-Ankur





Mime
View raw message