Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates
 216.139.236.26 as permitted sender)
Message-ID: <34213805.post@talk.nabble.com>
Date: Wed, 25 Jul 2012 22:47:18 -0700 (PDT)
From: kenyh <ken.yihan1990@gmail.com>
To: core-dev@hadoop.apache.org
Subject: MultithreadedMapper
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Multithread Mapreduce introduces multithread execution in map task. In hadoop
1.0.2, MultithreadedMapper implements multithread execution in mapper
function. But I found that synchronization is needed for record reading(read
the input Key and Value) and result output. This contention brings heavy
overhead in performance, which increase 50MB wordcount task execution from
40 seconds to 1 minute. I wonder if there are any optimization about the
multithread mapper to decrease the contention of input reading and output? 
-- 
View this message in context: http://old.nabble.com/MultithreadedMapper-tp34213805p34213805.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.