Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <22496810.post@talk.nabble.com>
Date: Fri, 13 Mar 2009 06:51:01 -0700 (PDT)
From: Doug Cook <nabble@candiru.com>
To: core-user@hadoop.apache.org
Subject: Reduce task going away for 10 seconds at a time
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi folks,

I've been debugging a severe performance problems with a Hadoop-based
application (a highly modified version of Nutch). I've recently upgraded to
Hadoop 0.19.1 from a much, much older version, and a reduce that used to
work just fine is now running orders of magnitude more slowly. 

>From the logs I can see that progress of my reduce stops for periods that
average almost exactly 10 seconds (with a very narrow distribution around 10
seconds), and it does so in various places in my code, but more or less in
proportion to how much time I'd expect the task would normally spend in that
particular place in the code, i.e. the behavior seems like my code is
randomly being interrupted for 10 seconds at a time. 

I'm planning to keep digging, but thought that these symptoms might sound
familiar to someone on this list. Ring any bells? Your help much
appreciated. 

Thanks!

Doug Cook
-- 
View this message in context: http://www.nabble.com/Reduce-task-going-away-for-10-seconds-at-a-time-tp22496810p22496810.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.