Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3491A97B1 for ; Thu, 26 Jul 2012 13:36:11 +0000 (UTC) Received: (qmail 42579 invoked by uid 500); 26 Jul 2012 13:27:16 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 36640 invoked by uid 500); 26 Jul 2012 13:27:05 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 35939 invoked by uid 500); 26 Jul 2012 13:22:35 -0000 Delivered-To: apmail-hadoop-core-dev@hadoop.apache.org Received: (qmail 51731 invoked by uid 99); 26 Jul 2012 05:47:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2012 05:47:45 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.26 as permitted sender) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2012 05:47:38 +0000 Received: from telerig.nabble.com ([192.168.236.162]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1SuGus-0007oA-38 for core-dev@hadoop.apache.org; Wed, 25 Jul 2012 22:47:18 -0700 Message-ID: <34213805.post@talk.nabble.com> Date: Wed, 25 Jul 2012 22:47:18 -0700 (PDT) From: kenyh To: core-dev@hadoop.apache.org Subject: MultithreadedMapper MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: ken.yihan1990@gmail.com Multithread Mapreduce introduces multithread execution in map task. In hadoop 1.0.2, MultithreadedMapper implements multithread execution in mapper function. But I found that synchronization is needed for record reading(read the input Key and Value) and result output. This contention brings heavy overhead in performance, which increase 50MB wordcount task execution from 40 seconds to 1 minute. I wonder if there are any optimization about the multithread mapper to decrease the contention of input reading and output? -- View this message in context: http://old.nabble.com/MultithreadedMapper-tp34213805p34213805.html Sent from the Hadoop core-dev mailing list archive at Nabble.com.