pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "PigAbstractionLayer" by AntonioMagnaghi
Date Tue, 20 Nov 2007 23:41:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by AntonioMagnaghi:
http://wiki.apache.org/pig/PigAbstractionLayer

------------------------------------------------------------------------------
- ##master-page:FrontPage
- #format wiki
- #language en
- #pragma section-numbers off
- 
  = Pig Abstraction Layer =
  
  == Introduction and Rational ==
@@ -33, +28 @@

  
  - '''Data Storage''': provides functionalities that pertain to storing and retrieving data.
It encapsulates the typical operations supported by file systems like creating, opening (for
reading or writing) a data object. 
  
- - '''Query Execution/Tracking''': provides functionalities to parse a Pig Latin program
and submit a compiled Pig job to a back-end. This API should enable the front-end to track
the current status of a job, its progress, diagnostic information and possibly to terminate
it.
+ - '''Query Execution Engine''': provides functionalities to parse a Pig Latin program and
submit a compiled Pig job to a back-end. This API should enable the front-end to track the
current status of a job, its progress, diagnostic information and possibly to terminate it.
  
  The sections below provide some initial suggestions for possible APIs for the Data Storage
and Query Execution abstractions.
  
@@ -317, +312 @@

  
     PigAbstractionHadoopBackEnd: sample code that implements the proposed APIs on top of
Hadoop.
  
+ === Execution Engine ===
+ 
+ {{{
+ package org.apache.pig.executionengine;
+ 
+ import java.util.Collection;
+ 
+ /**
+  * This is the main interface that various execution engines
+  * need to support and it is also the main interface that Pig
+  * will need to use to submit jobs for execution, retrieve information
+  * about their progress and possibly terminate them.
+  *
+  */
+ 
+ public interface ExecutionEngine {
+ 
+ 	/**
+ 	 * Place holder for possible initialization activities.
+ 	 */
+ 	public void init();
+ 
+ 	/**
+ 	 * Clean-up and releasing of resources.
+ 	 */
+ 	public void close();
+ 
+ 	
+ 	/**
+ 	 * Provides configuration information about the execution engine itself.
+ 	 * 
+ 	 * @return - information about the configuration used to connect to execution engine
+ 	 */
+ 	public ExecutionEngineProperties getConfiguration();
+ 	
+ 	/**
+ 	 * Provides a way to dynamically change configuration parameters
+ 	 * at the Execution Engine level.
+ 	 * 
+ 	 * @param newConfiguration - the new configuration settings
+ 	 * @throws when configuration conflicts are detected
+ 	 * 
+ 	 */
+ 	public void updateConfiguration(ExecutionEngineProperties newConfiguration) 
+                       throws ExecutionEngineException;
+ 	
+ 	/**
+ 	 * Provides statistics on the Execution Engine: number of nodes,
+ 	 * node failure rates, average load, average job wait time...
+ 	 * @return ExecutionEngineProperties
+ 	 */
+ 	public ExecutionEngineProperties getStatistics();
+ 
+ 	/**
+ 	 * Compiles a logical plan into a physical plan, given a set of configuration
+ 	 * properties that apply at the plan-level. For instance desired degree of 
+ 	 * parallelism for this plan, which could be different from the "default"
+ 	 * one set at the execution engine level.
+ 	 * 
+ 	 * @param logical plan
+ 	 * @param properties
+ 	 * @return physical plan
+ 	 */
+ 	public ExecutionEnginePhysicalPlan compile(ExecutionEngineLogicalPlan plan,
+ 		                                     ExecutionEngineProperties properties);
+ 	
+ 	/**
+ 	 * This may be useful to support admin functionalities.
+ 	 * 
+ 	 * @return a collection of jobs "known" to the execution engine,
+ 	 * say jobs currently queued up or running (this can be determined 
+ 	 * by the obtaining the properties of the job)
+ 	 * 
+ 	 * @throws ExecutionEngineException maybe the user does not have privileges
+ 	 * to obtain this information...
+ 	 */
+ 	public Collection<ExecutionEnginePhysicalPlan> allPhysicalPlans () throws
+ 	    ExecutionEngineException;
+ }
+ }}}
+ 
+ === Execution Engine Physical Plan ===
+ Interface to manage a Physical Plan.
+ 
+ {{{
+ package org.apache.pig.executionengine;
+ 
+ public interface ExecutionEnginePhysicalPlan {
+ 
+ 	/**
+ 	 * Execute the physical plan.
+ 	 * This is non-blocking. See getStatistics to pull information
+ 	 * about the job.
+ 	 * 
+ 	 * @throws
+ 	 */
+ 	public void execute() throws ExecutionEngineException;
+ 
+ 	/**
+ 	 * A job may have properties, like a priority, degree of parallelism...
+ 	 * Some of such properties may be inherited from the ExecutionEngine
+ 	 * configuration, other may have been set specifically for this job.
+ 	 * For instance, a job scheduler may attribute low priority to
+ 	 * jobs automatically started for maintenance purpose.
+ 	 * 
+ 	 * @return set of properties
+ 	 */
+ 	public ExecutionEngineProperties getConfiguration();
+ 	
+ 	/**
+ 	 * Some properties of the job may be changed, say the priority...
+ 	 * 
+ 	 * @param configuration
+ 	 * @throws some changes may not be allowed, for instance the some
+ 	 * job-level properties cannot override Execution-Engine-level properties
+ 	 * or maybe some properties can only be changes only in certain states of the
+ 	 * job, say, once the job is started, parallelism level may not be changed...
+ 	 */
+ 	public void updateConfiguration(ExecutionEngineProperties configuration)
+ 		throws ExecutionEngineException;
+ 	
+ 	/**
+ 	 * Hook to provide asynchronous notifications.
+ 	 * 
+ 	 */
+ 	public void notify(ExecutionEnginerNotificationEvent event);
+ 	
+ 	/**
+ 	 * Kills current job.
+ 	 * 
+ 	 * @throws ExecutionEngineException
+ 	 */
+ 	public void kill() throws ExecutionEngineException;
+ 	
+ 	/**
+ 	 * Can be information about the state (not submitted, e.g. the execute method
+ 	 * has not been called yet; not running, e.g. execute has been issued, 
+ 	 * but job is waiting; running...; completed; aborted...; progress information
+ 	 * 
+ 	 * @return
+ 	 */
+ 	public ExecutionEngineProperties getStatistics();
+ }
+ }}}
+ 

Mime
View raw message