falcon-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From venkat...@apache.org
Subject svn commit: r1491876 [2/2] - in /incubator/falcon: site/ site/docs/ site/images/ site/slides/ site/wiki/ trunk/ trunk/src/site/resources/images/ trunk/src/site/resources/slides/ trunk/src/site/twiki/ trunk/src/site/twiki/docs/
Date Tue, 11 Jun 2013 16:59:54 GMT
Modified: incubator/falcon/trunk/src/site/resources/slides/falcon-user-guide.html
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/src/site/resources/slides/falcon-user-guide.html?rev=1491876&r1=1491875&r2=1491876&view=diff
==============================================================================
--- incubator/falcon/trunk/src/site/resources/slides/falcon-user-guide.html (original)
+++ incubator/falcon/trunk/src/site/resources/slides/falcon-user-guide.html Tue Jun 11 16:59:54
2013
@@ -32,10 +32,11 @@
 <!-- Begin slides. Just make elements with a class of slide. -->
 
 <section class="slide" id="intro">
-    <h1>Apache Falcon - User Guide</h1>
+    <h2>Apache Falcon - User Guide</h2>
+    <h3>Coming soon .... </h3>
 </section>
 
-<section class="slide" id="build">
+<!--<section class="slide" id="build">
     <h2>Building Apache Falcon</h2>
     <ol>
         <li>
@@ -44,7 +45,7 @@
         </li>
         <li>
             <p>Compile the project</p>
-            <pre><code>mvn -DskipTests clean package</code></pre>
+            <pre><code>mvn -DskipTests clean package verify</code></pre>
         </li>
         <li>
             <p>Optionally run the tests</p>
@@ -93,7 +94,7 @@
             <p>TBD: Schedule a sample process</p>
         </li>
     </ul>
-</section>
+</section>-->
 
 <!-- End slides. -->
 

Modified: incubator/falcon/trunk/src/site/twiki/docs/FalconArchitecture.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/src/site/twiki/docs/FalconArchitecture.twiki?rev=1491876&r1=1491875&r2=1491876&view=diff
==============================================================================
--- incubator/falcon/trunk/src/site/twiki/docs/FalconArchitecture.twiki (original)
+++ incubator/falcon/trunk/src/site/twiki/docs/FalconArchitecture.twiki Tue Jun 11 16:59:54
2013
@@ -16,19 +16,19 @@
 ---++ Architecture
 ---+++ Introduction
 Falcon is a feed and process management platform over hadoop. Falcon essentially transforms
user's feed
-and process configurations into repeated actions through a standard workflow engine. Falcon
by itself
-doesn't do any heavy lifting. All the functions and workflow state management requirements
are delegated
-to the workflow scheduler. The only thing that Falcon maintains is the dependencies and relationship
between
-these entities. This is adequate to provide integrated and seamless experience to the developers
using
+and process configurations into repeated actions through a standard workflow engine (Apache
Oozie). Falcon
+by itself doesn't do any heavy lifting. All the functions and workflow state management requirements
are
+delegated to the workflow scheduler. The only thing that Falcon maintains is the dependencies
and relationship
+between these entities. This is adequate to provide integrated and seamless experience to
the developers using
 the falcon platform.
 
 ---+++ Falcon Architecture - Overview
 <img src="../images/Architecture.png" height="400" width="600" />
 
 ---+++ Scheduler
-Falcon system has picked Oozie as the default scheduler. However the system is open for integration
with
+Falcon system has picked Apache Oozie as the default scheduler. However the system is open
for integration with
 other schedulers. Lot of the data processing in hadoop requires scheduling to be based on
both data availability
-as well as time. Oozie currently supports these capabilities off the shelf and hence the
choice.
+as well as time. Apache Oozie currently supports these capabilities off the shelf and hence
the choice.
 
 ---+++ Control flow
 Though the actual responsibility of the workflow is with the scheduler (Oozie), Falcon remains
in the
@@ -85,89 +85,6 @@ individual operations performed are reco
 the overall user action. In some cases, it is not possible to undo the action. In such cases,
Falcon attempts
 to keep the system in an consistent state.
 
----++ Entity Management actions
-
----+++ Submit:
-Entity submit action allows a new cluster/feed/process to be setup within Falcon. Submitted
entity is not
-scheduled, meaning it would simply be in the configuration store within Falcon. Besides validating
against
-the schema for the corresponding entity being added, the Falcon system would also perform
inter-field
-validations within the configuration file and validations across dependent entities.
-
----+++ List:
-List all the entities within the falcon config store for the entity type being requested.
This will include
-both scheduled and submitted entity configurations.
-
----+++ Dependency:
-Returns the dependencies of the requested entity. Dependency list include both forward and
backward
-dependencies (depends on & is dependent on). For ex, a feed would show process that are
dependent on the
-feed and the clusters that it depends on.'
-
----+++ Schedule:
-Feeds or Processes that are already submitted and present in the config store can be scheduled.
Upon schedule,
-Falcon system wraps the required repeatable action as a bundle of oozie coordinators and
executes them on the
-Oozie scheduler. (It is possible to extend Falcon to use an alternate workflow engine other
than Oozie).
-Falcon overrides the workflow instance's external id in Oozie to reflect the process/feed
and the nominal
-time. This external Id can then be used for instance management functions.
-
----+++ Suspend:
-This action is applicable only on scheduled entity. This triggers suspend on the oozie bundle
that was
-scheduled earlier through the schedule function. No further instances are executed on a suspended
process/feed.
-
----+++ Resume:
-Puts a suspended process/feed back to active, which in turn resumes applicable oozie bundle.
-
----+++ Status:
-Gets the current status of the entity.
-
----+++ Definition:
-Gets the current entity definition as stored in the configuration store. Please note that
user documentations
-in the entity will not be retained.
-
----+++ Delete:
-Delete operation on the entity removes any scheduled activity on the workflow engine, besides
removing the
-entity from the falcon configuration store. Delete operation on an entity would only succeed
if there are
-no dependent entities on the deleted entity.
-
----+++ Update:
-Update operation allows an already submitted/scheduled entity to be updated. Cluster update
is currently
-not allowed. Feed update can cause cascading update to all the processes already scheduled.
The following
-set of actions are performed in Oozie to realize an update.
-
-   * Suspend the previously scheduled Oozie coordinator. This is prevent any new action from
being triggered.
-   * Update the coordinator to set the end time to "now"
-   * Resume the suspended coordiantors
-   * Schedule as per the new process/feed definition with the start time as "now"
-
----++ Instance Management actions
-
-
-Instance Manager gives user the option to control individual instances of the process based
on their instance start time (start time of that instance). Start time needs to be given in
standard TZ format. Example:   01 Jan 2012 01:00  => 2012-01-01T01:00Z
-
-All the instance management operations (except running) allow single instance or list of
instance within a Date range to be acted on. Make sure the dates are valid. i.e are within
the start and  end time of process itself. 
-
-For every query in instance management the process name is a compulsory parameter. 
-
-Parameters -start and -end are used to mention the date range within which you want the instance
to be operated upon. 
-
--start:   using only  "-start" without  "-end"  will conduct the desired operation only on
single instance given by date along with start.
-
--end:  "-end"  can only be used along with "-start" . It corresponds to the end date till
which instance need to operated upon. 
-
-   * 1. *status*: -status option via CLI can be used to get the status of a single or multiple
instances.  If the instance is not yet materialized but is within the process validity range,
WAITING is returned as the state.Along with the status of the instance log location is also
returned.
-
-
-   * 2.	*running*: -running returns all the running instance of the process. It does not
take any start or end dates but simply return all the instances in state RUNNING at that given
time. 
-
-   * 3.	*rerun*: -rerun is the option that you will use most often from instance management.
As the name suggest this option is used to rerun a particular instance or instances of the
process. The rerun option reruns all parent workflow for the instance, which in turn rerun
all the sub-workflows for it. This option is valid for any instance in terminal state, i.e.
KILLED, SUCCEEDED, FAILED. User can also set properties in the request, which will give options
what types of actions should be rerun like, only failed, run all etc. These properties are
dependent on the workflow engine being used along with falcon.
-   
-   * 4. *suspend*: -suspend is used to suspend a instance or instances  for the given process.
This option pauses the parent workflow at the state, which it was in at the time of execution
of this command. This command is similar to SUSPEND process command in functionality only
difference being, SUSPEND process suspends all the instance whereas suspend instance suspend
only that instance or instances in the range. 
-
-   * 5.	*resume*: -resume option is used to resume any instance that  is in suspended state.
 (Note: due to a bug in oozie �resume option in some cases may not actually resume the
suspended instance/ instances)
-   * 6. *kill*: -kill option can be used to kill an instance or multiple instances 
-
-
-In all the cases where your request is syntactically correct but logically not, the instance
/ instances are returned with the same status as earlier. Example:  trying to resume a KILLED
 / SUCCEEDED instance will return the instance with KILLED / SUCCEEDED, without actually performing
any operation. This is so because only an instance in SUSPENDED state can be resumed. Same
thing is valid for rerun a SUSPENDED or RUNNING options etc. 
-
 ---++ Retention
 In coherence with it's feed lifecycle management philosophy, Falcon allows the user to retain
 data in the system
 for a specific period of time for a scheduled feed. The user can specify the retention period
in the respective 

Modified: incubator/falcon/trunk/src/site/twiki/docs/FalconCLI.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/src/site/twiki/docs/FalconCLI.twiki?rev=1491876&r1=1491875&r2=1491876&view=diff
==============================================================================
--- incubator/falcon/trunk/src/site/twiki/docs/FalconCLI.twiki (original)
+++ incubator/falcon/trunk/src/site/twiki/docs/FalconCLI.twiki Tue Jun 11 16:59:54 2013
@@ -6,82 +6,163 @@ FalconCLI is a interface between user an
 
 ---+++Submit
 
-Submit option is used to set up entity definition.
+Entity submit action allows a new cluster/feed/process to be setup within Falcon. Submitted
entity is not
+scheduled, meaning it would simply be in the configuration store within Falcon. Besides validating
against
+the schema for the corresponding entity being added, the Falcon system would also perform
inter-field
+validations within the configuration file and validations across dependent entities.
 
+<verbatim>
 Example: 
 $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml
+</verbatim>
 
 Note: The url option in the above and all subsequent commands is optional. If not mentioned
it will be picked from client.properties file. If the option is not provided and also not
set in client.properties, Falcon CLI will fail.
 
 ---+++Schedule
 
-Once submitted, an entity can be scheduled using schedule option. Process and feed can only
be scheduled.
+Feeds or Processes that are already submitted and present in the config store can be scheduled.
Upon schedule,
+Falcon system wraps the required repeatable action as a bundle of oozie coordinators and
executes them on the
+Oozie scheduler. (It is possible to extend Falcon to use an alternate workflow engine other
than Oozie).
+Falcon overrides the workflow instance's external id in Oozie to reflect the process/feed
and the nominal
+time. This external Id can then be used for instance management functions.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [process|feed] -name <<name>> -schedule
 
 Example:
 $FALCON_HOME/bin/falcon entity  -type process -name sampleProcess -schedule
+</verbatim>
 
 ---+++Suspend
 
-Suspend on an entity results in suspension of the oozie bundle that was scheduled earlier
through the schedule function. No further instances are executed on a suspended entity. Only
schedulable entities(process/feed) can be suspended.
+This action is applicable only on scheduled entity. This triggers suspend on the oozie bundle
that was
+scheduled earlier through the schedule function. No further instances are executed on a suspended
process/feed.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -suspend
+</verbatim>
 
 ---+++Resume
 
 Puts a suspended process/feed back to active, which in turn resumes applicable oozie bundle.
 
+<verbatim>
 Usage:
  $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -resume
+</verbatim>
 
 ---+++Delete
 
-Delete removes the submitted entity definition for the specified entity and put it into the
archive.
+Delete operation on the entity removes any scheduled activity on the workflow engine, besides
removing the
+entity from the falcon configuration store. Delete operation on an entity would only succeed
if there are
+no dependent entities on the deleted entity.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [cluster|feed|process] -name <<name>> -delete
+</verbatim>
 
 ---+++List
 
-Entities of a particular type can be listed with list sub-command.
+List all the entities within the falcon config store for the entity type being requested.
This will include
+both scheduled and submitted entity configurations.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -list
+</verbatim>
 
 ---+++Update
 
 Update operation allows an already submitted/scheduled entity to be updated. Cluster update
is currently
-not allowed.
+not allowed. Feed update can cause cascading update to all the processes already scheduled.
The following
+set of actions are performed in Oozie to realize an update.
 
+   * Suspend the previously scheduled Oozie coordinator. This is prevent any new action from
being triggered.
+   * Update the coordinator to set the end time to "now"
+   * Resume the suspended coordiantors
+   * Schedule as per the new process/feed definition with the start time as "now"
+
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -update
+</verbatim>
 
 ---+++Status
 
 Status returns the current status of the entity.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -status
+</verbatim>
 
 ---+++Dependency
 
-With the use of dependency option, we can list all the entities on which the specified entity
is dependent. For example for a feed, dependency return the cluster name and for process it
returns all the input feeds, output feeds and cluster names.
+Returns the dependencies of the requested entity. Dependency list include both forward and
backward
+dependencies (depends on & is dependent on). For ex, a feed would show process that are
dependent on the
+feed and the clusters that it depends on.'
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -dependency
+</verbatim>
 
 ---+++Definition
 
-Definition option returns the entity definition submitted earlier during submit step.
+Gets the current entity definition as stored in the configuration store. Please note that
user documentations
+in the entity will not be retained.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -definition
+</verbatim>
 
 ---++Instance Management Options
 
+Instance Manager gives user the option to control individual instances of the process based
on their instance start time (start time of that instance). Start time needs to be given in
standard TZ format. Example:   01 Jan 2012 01:00  => 2012-01-01T01:00Z
+
+All the instance management operations (except running) allow single instance or list of
instance within a Date range to be acted on. Make sure the dates are valid. i.e are within
the start and  end time of process itself.
+
+For every query in instance management the process name is a compulsory parameter.
+
+Parameters -start and -end are used to mention the date range within which you want the instance
to be operated upon.
+
+-start:   using only  "-start" without  "-end"  will conduct the desired operation only on
single instance given by date along with start.
+
+-end:  "-end"  can only be used along with "-start" . It corresponds to the end date till
which instance need to operated upon.
+
+   * 1. *status*: -status option via CLI can be used to get the status of a single or multiple
instances.  If the instance is not yet materialized but is within the process validity range,
WAITING is returned as the state.Along with the status of the instance log location is also
returned.
+
+
+   * 2.	*running*: -running returns all the running instance of the process. It does not
take any start or end dates but simply return all the instances in state RUNNING at that given
time.
+
+   * 3.	*rerun*: -rerun is the option that you will use most often from instance management.
As the name suggest this option is used to rerun a particular instance or instances of the
process. The rerun option reruns all parent workflow for the instance, which in turn rerun
all the sub-workflows for it. This option is valid for any instance in terminal state, i.e.
KILLED, SUCCEEDED, FAILED. User can also set properties in the request, which will give options
what types of actions should be rerun like, only failed, run all etc. These properties are
dependent on the workflow engine being used along with falcon.
+
+   * 4. *suspend*: -suspend is used to suspend a instance or instances  for the given process.
This option pauses the parent workflow at the state, which it was in at the time of execution
of this command. This command is similar to SUSPEND process command in functionality only
difference being, SUSPEND process suspends all the instance whereas suspend instance suspend
only that instance or instances in the range.
+
+   * 5.	*resume*: -resume option is used to resume any instance that  is in suspended state.
 (Note: due to a bug in oozie �resume option in some cases may not actually resume the
suspended instance/ instances)
+   * 6. *kill*: -kill option can be used to kill an instance or multiple instances
+
+
+In all the cases where your request is syntactically correct but logically not, the instance
/ instances are returned with the same status as earlier. Example:  trying to resume a KILLED
 / SUCCEEDED instance will return the instance with KILLED / SUCCEEDED, without actually performing
any operation. This is so because only an instance in SUSPENDED state can be resumed. Same
thing is valid for rerun a SUSPENDED or RUNNING options etc.
+
+---+++Status
+
+Status option via CLI can be used to get the status of a single or multiple instances.  If
the instance is not yet materialized but is within the process validity range, WAITING is
returned as the state. Along with the status of the instance time is also returned. Log location
gives the oozie workflow url
+If the instance is in WAITING state, missing dependencies are listed
+
+Example : Suppose a process has 3 instance, one has succeeded,one is in running state and
other one is waiting, the expected output is:
+
+{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"},
{"instance":"2010-01-02T11:05Z","status":"WAITING"}]
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-status -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
+
 ---+++Kill
 
 Kill sub-command is used to kill all the instances of the specified process whose nominal
time is between the given start time and end time.
@@ -94,73 +175,79 @@ Example:   01 Jan 2012 01:00  => 2012-01
 
 3. Process name is compulsory parameter for each instance management command.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-kill -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
 
 ---+++Suspend
 
 Suspend is used to suspend a instance or instances  for the given process. This option pauses
the parent workflow at the state, which it was in at the time of execution of this command.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-suspend -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
 
 ---+++Continue
 
 Continue option is used to continue the failed workflow instance. This option is valid only
for process instances in terminal state, i.e. SUCCEDDED, KILLED or FAILED.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-re-run -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
 
 ---+++Rerun
 
 Rerun option is used to rerun instances of a given process. This option is valid only for
process instances in terminal state, i.e. SUCCEDDED, KILLED or FAILED. Optionally, you can
specify the properties to override.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-re-run -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-file <<properties
file>>]
+</verbatim>
 
 ---+++Resume
 
 Resume option is used to resume any instance that  is in suspended state.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-resume -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
-
----+++Status
-
-Status option via CLI can be used to get the status of a single or multiple instances.  If
the instance is not yet materialized but is within the process validity range, WAITING is
returned as the state. Along with the status of the instance time is also returned. Log location
gives the oozie workflow url
-If the instance is in WAITING state, missing dependencies are listed
-
-Example : Suppose a process has 3 instance, one has succeeded,one is in running state and
other one is waiting, the expected output is:
-
-{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"},
{"instance":"2010-01-02T11:05Z","status":"WAITING"}] 
-
-Usage:
-$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-status -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
 
 ---+++Running
 
 Running option provides all the running instances of the mentioned process.
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-running
+</verbatim>
 
 ---+++Logs
 
 Get logs for instance actions
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-logs -start "yyyy-MM-dd'T'HH:mm'Z'" [-end "yyyy-MM-dd'T'HH:mm'Z'"] [-runid <<runid>>]
-
+</verbatim>
 
 ---++Admin Options
 
 ---+++Help
 
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon admin -version
+</verbatim>
 
 ---+++Version
 
 Version returns the current verion of Falcon installed.
+
+<verbatim>
 Usage:
 $FALCON_HOME/bin/falcon admin -help
+</verbatim>
\ No newline at end of file

Modified: incubator/falcon/trunk/src/site/twiki/index.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/src/site/twiki/index.twiki?rev=1491876&r1=1491875&r2=1491876&view=diff
==============================================================================
--- incubator/falcon/trunk/src/site/twiki/index.twiki (original)
+++ incubator/falcon/trunk/src/site/twiki/index.twiki Tue Jun 11 16:59:54 2013
@@ -15,7 +15,7 @@ configurations are expressed in such a w
 explicitly described. This information about inter-dependencies between various entities
allows Falcon
 to orchestrate and manage various data management functions.
 
-Falcon was successfully accepted as an incubation project in April 2013 and is now in apache
incubation.
+Falcon was accepted as an incubation project in April 2013 and is now in apache incubation.
 
 
 <div id="components" class="carousel slide">



Mime
View raw message