hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mori Bellamy <mbell...@apple.com>
Subject question on fault tolerance
Date Mon, 11 Aug 2008 17:34:00 GMT
hey all,
i have a job consisting of three MR jobs back to back to back. the  
each job takes an appreciable percent of a day to complete (30% to  
70%). even though i execute these jobs in a blocking fashion:

the successful execution of one job does not guarantee that the next  
job in the pipeline starts. (i.e. i can log on to my taskTracker and  
see that conf's job finished successfully, but conf2's job never  
started). does anybody else have this problem? can anyone offer advice?

the only thing i can think of is that some other people with access to  
my cluster are accidentally killing jobs, but i doubt that's the case.

View raw message