Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 95061 invoked from network); 8 Sep 2008 07:52:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Sep 2008 07:52:50 -0000 Received: (qmail 91018 invoked by uid 500); 8 Sep 2008 07:52:46 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 90977 invoked by uid 500); 8 Sep 2008 07:52:46 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 90966 invoked by uid 99); 8 Sep 2008 07:52:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2008 00:52:45 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ankur.goel@corp.aol.com designates 64.236.137.26 as permitted sender) Received: from [64.236.137.26] (HELO r2d2.aoltw.net) (64.236.137.26) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2008 07:51:47 +0000 Received: from AOLDTCMEH01.ad.office.aol.com (aoldtcmeh01.office.aol.com [10.180.121.20]) by r2d2.aoltw.net (8.10.0/8.10.0) with ESMTP id m887ppA17544; Mon, 8 Sep 2008 00:51:51 -0700 (PDT) Received: from EVSBNG02.ad.office.aol.com ([10.146.190.243]) by AOLDTCMEH01.ad.office.aol.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 8 Sep 2008 03:51:52 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C91187.BEF302E0" Subject: Multithreaded reduce Date: Mon, 8 Sep 2008 13:21:47 +0530 Message-ID: <8F11722A0562BB4F80A680ED4CFED0D5053F4337@EVSBNG02.ad.office.aol.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Multithreaded reduce Thread-Index: AckRh77QK5fSQEPGQfyPZok/xjXfpQ== From: "Goel, Ankur" To: X-OriginalArrivalTime: 08 Sep 2008 07:51:52.0622 (UTC) FILETIME=[C1C944E0:01C91187] X-Virus-Checked: Checked by ClamAV on apache.org ------_=_NextPart_001_01C91187.BEF302E0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Folks, I have a setup where I am using a thread-pool implementation (provided by java.util.concurrent package) in reduce phase to do database I/O to speed up the database upload. The DB here is MySQL. I decided to go for additional parallelism via threads as=20 1. It considerably speeds up the upload while consuming less cluster resources (i.e. less number of reducers required).=20 2. The upload speed is not limited by the reduce task capacity of the cluster but by the DB's ability to handle max connections simultaneously and effectively. =20 Each reduce task has 2 thread pools. One that does the DB I/O and whose return a java.util.concurrent.FutureTask. Another pool that fetches result from this future task to do disc I/O i.e. outputCollector.collect(...). =20 When multiple threads from the second pool try to do a disc I/O, I get an AlreadyBeingCreatedException in the logs. If I set the second pool to only have 1 thread then things work fine! =20 It looks like the output collector was never assumed to be used from multiple threads. =20 Any thoughts on this? =20 Thanks -Ankur =20 ------_=_NextPart_001_01C91187.BEF302E0--