Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of
 gcjlhu-hadoop-user@m.gmane.org designates 80.91.229.2 as permitted sender)
To: core-user@hadoop.apache.org
From: "Billy Pearson" <sales@pearsonwholesale.com>
Subject: reduce task failing after 24 hours waiting
Date: Wed, 25 Mar 2009 21:23:29 -0500
Lines: 14
Message-ID: <gqeov1$ull$1@ger.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain;
	format=flowed;
	charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
Sender: news <news@ger.gmane.org>

I am seeing on one of my long running jobs about 50-60 hours that after 24 
hours all
active reduce task fail with the error messages

java.io.IOException: Task process exit with nonzero status of 255.
 at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

Is there something in the config that I can change to stop this?

Every time with in 1 min of 24 hours they all fail at the same time.
waist a lot of resource downloading the map outputs and merging them again.

Billy