hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <ankur.g...@corp.aol.com>
Subject Multithreaded reduce
Date Mon, 08 Sep 2008 07:51:47 GMT
Hi Folks,

             I have a setup where I am using a thread-pool
implementation (provided by java.util.concurrent package) in reduce
phase to do database I/O to speed up the database upload. The DB here is
MySQL. I decided to go for additional parallelism via threads as 

1. It considerably speeds up the upload while consuming less cluster
resources (i.e. less number of reducers required). 

2. The upload speed is not limited by the reduce task capacity of the
cluster but by the DB's ability to handle max connections simultaneously
and effectively.


Each reduce task has 2 thread pools. One that does the DB I/O and whose
return a java.util.concurrent.FutureTask. Another pool that fetches
result from this future task to do disc I/O i.e.


When multiple threads from the second pool try to do a disc I/O, I get
an AlreadyBeingCreatedException in the logs. If I set the second pool to
only have 1 thread then things work fine!


It looks like the output collector was never assumed to be used from
multiple threads.


Any thoughts on this?





  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message