hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: Using Hadoop in a production environment
Date Wed, 14 Jun 2006 03:22:28 GMT
doing a final rename of the target directory sounds like a good  
simple idea.

Could rename to something else if it failed.

The other approach is simply to drop completion stamps (create a ./ 
COMPLETE file) when all is done.

On Jun 13, 2006, at 5:51 PM, Paul Sutter wrote:

> We are starting to string together our disparate Hadoop jobs into a  
> running
> system, and we have a couple issues that are coming up.
> I'm looking for feedback or suggestions on how we can solve them.
> (1) Scheduling Hadoop jobs
> Could an Ant extension be developed to let a complex set of Hadoop  
> jobs be
> controlled using a sort of a build script that decides which jobs  
> need to be
> run?
> (2) How do we make Hadoop jobs atomic?
> One issue we have is that a failing job can leave directories in an
> inconsistent format, making a mess for the other jobs.
> I'm thiking of an atomic operation we could submit to the  
> nameserver. It
> might consist of multiple directory deletions and directory  
> renames, and
> would either complete in entirety, or not at all. And in this way,  
> we'd get
> the equivalent of a begin/commit/rollback capability for one simple
> function.
> Im curious to hear others thoughts on this topic.

View raw message