hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Sutter" <sut...@gmail.com>
Subject Using Hadoop in a production environment
Date Wed, 14 Jun 2006 00:51:25 GMT
We are starting to string together our disparate Hadoop jobs into a running
system, and we have a couple issues that are coming up.

I'm looking for feedback or suggestions on how we can solve them.

(1) Scheduling Hadoop jobs

Could an Ant extension be developed to let a complex set of Hadoop jobs be
controlled using a sort of a build script that decides which jobs need to be
run?

(2) How do we make Hadoop jobs atomic?

One issue we have is that a failing job can leave directories in an
inconsistent format, making a mess for the other jobs.

I'm thiking of an atomic operation we could submit to the nameserver. It
might consist of multiple directory deletions and directory renames, and
would either complete in entirety, or not at all. And in this way, we'd get
the equivalent of a begin/commit/rollback capability for one simple
function.

Im curious to hear others thoughts on this topic.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message