hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: Groovy Scripting for Hadoop
Date Tue, 06 May 2008 15:08:33 GMT
> Have you seen my grool system which allows simple MR programs to be  
> written
> simply?

yes, I did take a look. your success with it was part of the reason I  
went with Groovy first, instead of Jython or Jruby.

> In addition, I have been working on a layer over Zookeeper to handle
> collection of data feed oriented information about availability of  
> files
> containing data.  This is similar in some sense to Amazon's simple  
> queue
> service except that it describes content as files, rather than  
> opaque blobs
> passed through queues.  This allows simpler retrospective processing  
> of
> data.  It would make a very good substrate for something like  
> Cascades since
> it would allow clean coordination semantics between multiple workers  
> on
> independent machines as well as provide notification of new (if  
> desired)
> without polling.  That would allow much lower latency systems to be  
> built.

That sounds really cool. I haven't played with zookeeper yet. Most of  
our coordination has been easily satisfied with SQS and Cascadings  
internal topological scheduler (and associated event listener  
interfaces). But that will only go so far.

We are currently testing a Amazon EC2/Hadoop 'on demand' cluster tool  
that was extraordinarily trivial to implement (it's not generic enough  
to share yet though). But I can see this could fall apart without  
something like Zookeeper as things get more sophisticated, or need to  
run outside AWS.

Chris K Wensel

View raw message