hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "BristolHadoopWorkshop" by SteveLoughran
Date Thu, 13 Aug 2009 14:07:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/BristolHadoopWorkshop

The comment on the change is:
Add discussion section

------------------------------------------------------------------------------
- = Bristol Hadoop Workshop =
+ = Bristol Hadoop Workshop, University of Bristol, August 10, 2009 =
  
  This was a little local workshop put together by Simon Metson of Bristol University, and
Steve Loughran of HP, to get some of the local Hadoop users in a room and talk about our ongoing
work.
  
  These presentations were intended to start discussion and thought
-  
+ 
    * [http://www.slideshare.net/steve_l/hadoop-futures Hadoop Futures] (Tom White, Cloudera)
    * [http://www.slideshare.net/steve_l/hadoop-hep Hadoop and High-Energy Physics] (Simon
Metson, Bristol University)
    * [http://www.slideshare.net/steve_l/hdfs HDFS] (Johan Oskarsson, Last.fm)
@@ -27, +27 @@

  
  == Long-Haul Hadoop ==
  
- This talk discussed the notion of a long-haul interface to Hadoop. 
+ This talk discussed the notion of a long-haul interface to Hadoop.
  
  This is a recurrent theme in various bug reports -anywhere where people
  want to submit jobs from a distance and keep an eye on them. Often this
@@ -44, +44 @@

  a sequence of MR jobs, and any Java classes which implement the Tool
  interface. The Tool would be run in the datacentre, in some
  medium-availability host, so you could switch your laptop off and know
- that the program was still running. 
+ that the program was still running.
  
  There is work underway at Yahoo! with Oozie, a workflow system for
  Hadoop; Cascading and Pig Latin are also languages to describe
@@ -61, +61 @@

  two
  
  WS-* : the big, comfortable, safe long-haul option, the Airbus A380.
- You, the passenger, get looked after by the cabin crew. 
+ You, the passenger, get looked after by the cabin crew.
  
  The floatplane. Agile, can get around fast, but you read the location of
  the life vest instructions very carefully, make a note of the exit in
@@ -71, +71 @@

  
  Two RESTful world views were discussed
  
-  * A pure REST: PUT/DELETE model of workflow objects, in which even their queue state is
manipulated using the full REST model. This is clean, ideal for clients such as Restlet, and
HTML5 browsers. 
+  * A pure REST: PUT/DELETE model of workflow objects, in which even their queue state is
manipulated using the full REST model. This is clean, ideal for clients such as Restlet, and
HTML5 browsers.
   * An HTTP Post model, in which work is POSTed to a queue server, URLs returned; operations
to the queued workflows via POST or PUT, GET for state updates.
  
  Steve gave a partial demonstration of Mombasa, his prototype
  "long-haul route to the elephants". This consists of:
  
   * A RESTy interface built from JAX-RS, hosted as the Jersey runtime under Jetty, deployed
in-datacentre by SmartFrog
-    
+ 
   * A Portlet GUI to the same set of operations, this time running in-datacentre in a portlet
server. (Which may be liferay-on-tomcat, but
   does not need to be). It is implicitly implementing the HTTP Post  model.
-  
+ 
  Currently the portlet is not using the long-haul API itself, though
  there is no reason why it should not, in which case it will not only
- drive the API, it will test it. 
+ drive the API, it will test it.
  
  Other Portlets will apparently provide cluster management by talking to
  the relevant "cloud" APIs: Add/decommission nodes, view logs, etc, and
  simple HDFS file access.
  
  Long-haul filesystem access is another issue. Ideally, WebDAV would be
- good, as there are so many clients and it is a pure REST API. 
+ good, as there are so many clients and it is a pure REST API.
  But parts of the WebDAV spec are odd (same FS semantics as Win98/FAT),
  and you can be sure of interop grief. Amazon S3 is simpler, as long as
- you avoid their daft authentication mechanism. 
+ you avoid their daft authentication mechanism.
  
+ Discussion: Simon mentioned that they had a REST API to some of the CERN job submission
services, and later sent out [https://twiki.cern.ch/twiki/bin/view/CMS/DMWTTutorialDatabaseREST#REST_classes_in_Webtools
a link]. There was general agreement that you need to push out more than just MR jobs
+ 

Mime
View raw message