falcon-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From srik...@apache.org
Subject svn commit: r1515884 [2/2] - in /incubator/falcon/trunk: ./ general/ general/src/ general/src/site/ general/src/site/twiki/docs/ general/src/site/twiki/wiki/ releases/ releases/0.3-incubating/ releases/0.3-incubating/src/ releases/0.3-incubating/src/si...
Date Tue, 20 Aug 2013 17:04:55 GMT
Added: incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/FalconCLI.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/FalconCLI.twiki?rev=1515884&view=auto
==============================================================================
--- incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/FalconCLI.twiki (added)
+++ incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/FalconCLI.twiki Tue
Aug 20 17:04:54 2013
@@ -0,0 +1,253 @@
+---+FalconCLI
+
+FalconCLI is a interface between user and Falcon. It is a command line utility provided by
Falcon. FalconCLI supports Entity Management, Instance Management and Admin operations.There
is a set of web services that are used by FalconCLI to interact with Falcon.
+
+---++Entity Management Operations
+
+---+++Submit
+
+Entity submit action allows a new cluster/feed/process to be setup within Falcon. Submitted
entity is not
+scheduled, meaning it would simply be in the configuration store within Falcon. Besides validating
against
+the schema for the corresponding entity being added, the Falcon system would also perform
inter-field
+validations within the configuration file and validations across dependent entities.
+
+<verbatim>
+Example: 
+$FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml
+</verbatim>
+
+Note: The url option in the above and all subsequent commands is optional. If not mentioned
it will be picked from client.properties file. If the option is not provided and also not
set in client.properties, Falcon CLI will fail.
+
+---+++Schedule
+
+Feeds or Processes that are already submitted and present in the config store can be scheduled.
Upon schedule,
+Falcon system wraps the required repeatable action as a bundle of oozie coordinators and
executes them on the
+Oozie scheduler. (It is possible to extend Falcon to use an alternate workflow engine other
than Oozie).
+Falcon overrides the workflow instance's external id in Oozie to reflect the process/feed
and the nominal
+time. This external Id can then be used for instance management functions.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon entity  -type [process|feed] -name <<name>> -schedule
+
+Example:
+$FALCON_HOME/bin/falcon entity  -type process -name sampleProcess -schedule
+</verbatim>
+
+---+++Suspend
+
+This action is applicable only on scheduled entity. This triggers suspend on the oozie bundle
that was
+scheduled earlier through the schedule function. No further instances are executed on a suspended
process/feed.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -suspend
+</verbatim>
+
+---+++Resume
+
+Puts a suspended process/feed back to active, which in turn resumes applicable oozie bundle.
+
+<verbatim>
+Usage:
+ $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -resume
+</verbatim>
+
+---+++Delete
+
+Delete operation on the entity removes any scheduled activity on the workflow engine, besides
removing the
+entity from the falcon configuration store. Delete operation on an entity would only succeed
if there are
+no dependent entities on the deleted entity.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon entity  -type [cluster|feed|process] -name <<name>> -delete
+</verbatim>
+
+---+++List
+
+List all the entities within the falcon config store for the entity type being requested.
This will include
+both scheduled and submitted entity configurations.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -list
+</verbatim>
+
+---+++Update
+
+Update operation allows an already submitted/scheduled entity to be updated. Cluster update
is currently
+not allowed. Feed update can cause cascading update to all the processes already scheduled.
The following
+set of actions are performed in Oozie to realize an update.
+
+   * Suspend the previously scheduled Oozie coordinator. This is prevent any new action from
being triggered.
+   * Update the coordinator to set the end time to "now"
+   * Resume the suspended coordiantors
+   * Schedule as per the new process/feed definition with the start time as "now"
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -update
+</verbatim>
+
+---+++Status
+
+Status returns the current status of the entity.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -status
+</verbatim>
+
+---+++Dependency
+
+Returns the dependencies of the requested entity. Dependency list include both forward and
backward
+dependencies (depends on & is dependent on). For ex, a feed would show process that are
dependent on the
+feed and the clusters that it depends on.'
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -dependency
+</verbatim>
+
+---+++Definition
+
+Gets the current entity definition as stored in the configuration store. Please note that
user documentations
+in the entity will not be retained.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -definition
+</verbatim>
+
+---++Instance Management Options
+
+Instance Manager gives user the option to control individual instances of the process based
on their instance start time (start time of that instance). Start time needs to be given in
standard TZ format. Example:   01 Jan 2012 01:00  => 2012-01-01T01:00Z
+
+All the instance management operations (except running) allow single instance or list of
instance within a Date range to be acted on. Make sure the dates are valid. i.e are within
the start and  end time of process itself.
+
+For every query in instance management the process name is a compulsory parameter.
+
+Parameters -start and -end are used to mention the date range within which you want the instance
to be operated upon.
+
+-start:   using only  "-start" without  "-end"  will conduct the desired operation only on
single instance given by date along with start.
+
+-end:  "-end"  can only be used along with "-start" . It corresponds to the end date till
which instance need to operated upon.
+
+   * 1. *status*: -status option via CLI can be used to get the status of a single or multiple
instances.  If the instance is not yet materialized but is within the process validity range,
WAITING is returned as the state.Along with the status of the instance log location is also
returned.
+
+
+   * 2.	*running*: -running returns all the running instance of the process. It does not
take any start or end dates but simply return all the instances in state RUNNING at that given
time.
+
+   * 3.	*rerun*: -rerun is the option that you will use most often from instance management.
As the name suggest this option is used to rerun a particular instance or instances of the
process. The rerun option reruns all parent workflow for the instance, which in turn rerun
all the sub-workflows for it. This option is valid for any instance in terminal state, i.e.
KILLED, SUCCEEDED, FAILED. User can also set properties in the request, which will give options
what types of actions should be rerun like, only failed, run all etc. These properties are
dependent on the workflow engine being used along with falcon.
+
+   * 4. *suspend*: -suspend is used to suspend a instance or instances  for the given process.
This option pauses the parent workflow at the state, which it was in at the time of execution
of this command. This command is similar to SUSPEND process command in functionality only
difference being, SUSPEND process suspends all the instance whereas suspend instance suspend
only that instance or instances in the range.
+
+   * 5.	*resume*: -resume option is used to resume any instance that  is in suspended state.
 (Note: due to a bug in oozie �resume option in some cases may not actually resume the
suspended instance/ instances)
+   * 6. *kill*: -kill option can be used to kill an instance or multiple instances
+
+
+In all the cases where your request is syntactically correct but logically not, the instance
/ instances are returned with the same status as earlier. Example:  trying to resume a KILLED
 / SUCCEEDED instance will return the instance with KILLED / SUCCEEDED, without actually performing
any operation. This is so because only an instance in SUSPENDED state can be resumed. Same
thing is valid for rerun a SUSPENDED or RUNNING options etc.
+
+---+++Status
+
+Status option via CLI can be used to get the status of a single or multiple instances.  If
the instance is not yet materialized but is within the process validity range, WAITING is
returned as the state. Along with the status of the instance time is also returned. Log location
gives the oozie workflow url
+If the instance is in WAITING state, missing dependencies are listed
+
+Example : Suppose a process has 3 instance, one has succeeded,one is in running state and
other one is waiting, the expected output is:
+
+{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"},
{"instance":"2010-01-02T11:05Z","status":"WAITING"}]
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-status -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
+
+---+++Kill
+
+Kill sub-command is used to kill all the instances of the specified process whose nominal
time is between the given start time and end time.
+
+Note: 
+1. For all the instance management sub-commands, if end time is not specified, Falcon will
perform the actions on all the instances whose instance time falls after the start time.
+
+2. The start time and end time needs to be specified in TZ format. 
+Example:   01 Jan 2012 01:00  => 2012-01-01T01:00Z
+
+3. Process name is compulsory parameter for each instance management command.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-kill -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
+
+---+++Suspend
+
+Suspend is used to suspend a instance or instances  for the given process. This option pauses
the parent workflow at the state, which it was in at the time of execution of this command.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-suspend -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
+
+---+++Continue
+
+Continue option is used to continue the failed workflow instance. This option is valid only
for process instances in terminal state, i.e. SUCCEDDED, KILLED or FAILED.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-re-run -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
+
+---+++Rerun
+
+Rerun option is used to rerun instances of a given process. This option is valid only for
process instances in terminal state, i.e. SUCCEDDED, KILLED or FAILED. Optionally, you can
specify the properties to override.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-re-run -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-file <<properties
file>>]
+</verbatim>
+
+---+++Resume
+
+Resume option is used to resume any instance that  is in suspended state.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-resume -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
+</verbatim>
+
+---+++Running
+
+Running option provides all the running instances of the mentioned process.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-running
+</verbatim>
+
+---+++Logs
+
+Get logs for instance actions
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>>
-logs -start "yyyy-MM-dd'T'HH:mm'Z'" [-end "yyyy-MM-dd'T'HH:mm'Z'"] [-runid <<runid>>]
+</verbatim>
+
+---++Admin Options
+
+---+++Help
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon admin -version
+</verbatim>
+
+---+++Version
+
+Version returns the current verion of Falcon installed.
+
+<verbatim>
+Usage:
+$FALCON_HOME/bin/falcon admin -help
+</verbatim>
\ No newline at end of file

Added: incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/InstallationSteps.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/InstallationSteps.twiki?rev=1515884&view=auto
==============================================================================
--- incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/InstallationSteps.twiki
(added)
+++ incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/InstallationSteps.twiki
Tue Aug 20 17:04:54 2013
@@ -0,0 +1,86 @@
+---++ Building & Installing Falcon
+
+
+---+++ Building Falcon
+
+Download sources from http://www.apache.org/dist/incubator/falcon/0.3-incubating/falcon-0.3-incubating-sources.tar.gz
+<verbatim>
+
+tar -xzvf falcon-0.3-incubating-sources.tar.gz
+
+cd falcon-0.3-incubating-sources
+
+export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean assembly:assembly
-DskipTests -DskipCheck=true
+
+[optionally -Dhadoop.version=<<hadoop.version>> can be appended to build for
a specific version of hadoop]
+[Falcon has currently not been tested with secure Hadoop / Hadoop 2.0]
+
+Tar can be found in target/falcon-0.3-incubating-bin.tar.gz
+
+Tar is structured as follows
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- client.properties
+   |- log4j.xml
+|- src
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- classes (serer support classes)
+      |- lib (server support libs)
+   |- falcon.war
+|- logs (application log files & temp data files)
+   |- falcon.pid
+
+</verbatim>
+
+PS: By default the falcon is built for embedded mode.
+
+---+++ Installing & running Falcon
+
+*Installing falcon*
+<verbatim>
+tar -xzvf falcon-0.3-incubating-bin.tar.gz
+cd falcon-0.3-incubating-bin
+</verbatim>
+
+*Starting Falcon Server*
+<verbatim>
+bin/falcon-start
+</verbatim>
+
+*Using Falcon*
+<verbatim>
+bin/falcon admin -version
+Falcon server build version: {Version:"0.3-incubating-r4380d446a912252a8c173c43a858ab1a38443c47",Mode:"embedded"}
+
+----
+
+bin/falcon help
+(for more details about falcon cli usage)
+</verbatim>
+
+*Stopping Falcon Server*
+<verbatim>
+bin/falcon-stop
+</verbatim>
+
+---+++ Preparing oozie bundle for use with Falcon
+<verbatim>
+cd <<project home>>
+mkdir target/package
+src/bin/pacakge.sh <<hadoop-version>>
+
+>> ex. src/bin/pacakge.sh 1.1.2 or src/bin/pacakge.sh 0.20.2-cdh3u5
+>> oozie bundle available in target/package/oozie-3.2.0-incubating/distro/target/oozie-3.2.2-distro.tar.gz
+</verbatim>
\ No newline at end of file

Added: incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/OnBoarding.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/OnBoarding.twiki?rev=1515884&view=auto
==============================================================================
--- incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/OnBoarding.twiki (added)
+++ incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/docs/OnBoarding.twiki Tue
Aug 20 17:04:54 2013
@@ -0,0 +1,264 @@
+---++ Contents
+   * <a href="#Onboarding Steps">Onboarding Steps</a>
+   * <a href="#Sample Pipeline">Sample Pipeline</a>
+
+---+++ Onboarding Steps
+   * Create cluster definition for the cluster, specifying name node, job tracker, workflow
engine endpoint, messaging endpoint. Refer to [[EntitySpecification][cluster definition]]
for details.
+   * Create Feed definitions for each of the input and output specifying frequency, data
path, ownership. Refer to [[EntitySpecification][feed definition]] for details.
+   * Create Process definition for your job. Process defines configuration for the workflow
job. Important attributes are frequency, inputs/outputs and workflow path. Refer to [[EntitySpecification][process
definition]] for process details.
+   * Define workflow for your job using the workflow engine(only oozie is supported as of
now). Refer [[http://incubator.apache.org/oozie/docs/3.1.3/docs/WorkflowFunctionalSpec.html][Oozie
Workflow Specification]]. The libraries required for the workflow should be available in lib
folder in workflow path.
+   * Set-up workflow definition, libraries and referenced scripts on hadoop. 
+   * Submit cluster definition
+   * Submit and schedule feed and process definitions
+   
+
+---+++ Sample Pipeline
+---++++ Cluster   
+Cluster definition that contains end points for name node, job tracker, oozie and jms server:
+<verbatim>
+<?xml version="1.0"?>
+<!--
+    Cluster configuration
+  -->
+<cluster colo="ua2" description="" name="corp" xmlns="uri:falcon:cluster:0.1"
+    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">    
+    <interfaces>
+        <interface type="readonly" endpoint="hftp://name-node.com:50070" version="0.20.2-cdh3u0"
/>
+
+        <interface type="write" endpoint="hdfs://name-node.com:54310" version="0.20.2-cdh3u0"
/>
+
+        <interface type="execute" endpoint="job-tracker:54311" version="0.20.2-cdh3u0"
/>
+
+        <interface type="workflow" endpoint="http://oozie.com:11000/oozie/" version="3.1.4"
/>
+
+        <interface type="messaging" endpoint="tcp://jms-server.com:61616?daemon=true"
version="5.1.6" />
+    </interfaces>
+
+    <locations>
+        <location name="staging" path="/projects/falcon/staging" />
+        <location name="temp" path="/tmp" />
+        <location name="working" path="/projects/falcon/working" />
+    </locations>
+</cluster>
+</verbatim>
+   
+---++++ Input Feed
+Hourly feed that defines feed path, frequency, ownership and validity:
+<verbatim>
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+    Hourly sample input data
+  -->
+
+<feed description="sample input data" name="SampleInput" xmlns="uri:falcon:feed:0.1"
+    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+    <groups>group</groups>
+
+    <frequency>hours(1)</frequency>
+
+    <late-arrival cut-off="hours(6)" />
+
+    <clusters>
+        <cluster name="corp" type="source">
+            <validity start="2009-01-01T00:00Z" end="2099-12-31T00:00Z" timezone="UTC"
/>
+            <retention limit="months(24)" action="delete" />
+        </cluster>
+    </clusters>
+
+    <locations>
+        <location type="data" path="/projects/bootcamp/data/${YEAR}-${MONTH}-${DAY}-${HOUR}/SampleInput"
/>
+        <location type="stats" path="/projects/bootcamp/stats/SampleInput" />
+        <location type="meta" path="/projects/bootcamp/meta/SampleInput" />
+    </locations>
+
+    <ACL owner="suser" group="users" permission="0755" />
+
+    <schema location="/none" provider="none" />
+</feed>
+</verbatim>
+
+---++++ Output Feed
+Daily feed that defines feed path, frequency, ownership and validity:
+<verbatim>
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+    Daily sample output data
+  -->
+
+<feed description="sample output data" name="SampleOutput" xmlns="uri:falcon:feed:0.1"
+xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+    <groups>group</groups>
+
+    <frequency>days(1)</frequency>
+
+    <late-arrival cut-off="hours(6)" />
+
+    <clusters>
+        <cluster name="corp" type="source">
+            <validity start="2009-01-01T00:00Z" end="2099-12-31T00:00Z" timezone="UTC"
/>
+            <retention limit="months(24)" action="delete" />
+        </cluster>
+    </clusters>
+
+    <locations>
+        <location type="data" path="/projects/bootcamp/output/${YEAR}-${MONTH}-${DAY}/SampleOutput"
/>
+        <location type="stats" path="/projects/bootcamp/stats/SampleOutput" />
+        <location type="meta" path="/projects/bootcamp/meta/SampleOutput" />
+    </locations>
+
+    <ACL owner="suser" group="users" permission="0755" />
+
+    <schema location="/none" provider="none" />
+</feed>
+</verbatim>
+
+---++++ Process
+Sample process which runs daily at 6th hour on corp cluster. It takes one input - SampleInput
for the previous day(24 instances). It generates one output - SampleOutput for previous day.
The workflow is defined at /projects/bootcamp/workflow/workflow.xml. Any libraries available
for the workflow should be at /projects/bootcamp/workflow/lib. The process also defines properties
queueName, ssh.host, and fileTimestamp which are passed to the workflow. In addition, Falcon
exposes the following properties to the workflow: nameNode, jobTracker(hadoop properties),
input and output(Input/Output properties).
+
+<verbatim>
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+    Daily sample process. Runs at 6th hour every day. Input - last day's hourly data. Generates
output for yesterday
+ -->
+<process name="SampleProcess">
+    <cluster name="corp" />
+
+    <frequency>days(1)</frequency>
+
+    <validity start="2012-04-03T06:00Z" end="2022-12-30T00:00Z" timezone="UTC" />
+
+    <inputs>
+        <input name="input" feed="SampleInput" start="yesterday(0,0)" end="today(-1,0)"
/>
+    </inputs>
+
+    <outputs>
+            <output name="output" feed="SampleOutput" instance="yesterday(0,0)" />
+    </outputs>
+
+    <properties>
+        <property name="queueName" value="reports" />
+        <property name="ssh.host" value="host.com" />
+        <property name="fileTimestamp" value="${coord:formatTime(coord:nominalTime(),
'yyyy-MM-dd')}" />
+    </properties>
+
+    <workflow engine="oozie" path="/projects/bootcamp/workflow" />
+
+    <retry policy="backoff" delay="minutes(5)" attempts="3" />
+    
+    <late-process policy="exp-backoff" delay="hours(1)">
+        <late-input input="input" workflow-path="/projects/bootcamp/workflow/lateinput"
/>
+    </late-process>
+</process>
+</verbatim>
+
+---++++ Oozie Workflow
+The sample user workflow contains 3 actions:
+   * Pig action - Executes pig script /projects/bootcamp/workflow/script.pig
+   * concatenator - Java action that concatenates part files and generates a single file
+   * file upload - ssh action that gets the concatenated file from hadoop and sends the file
to a remote host
+   
+<verbatim>
+<workflow-app xmlns="uri:oozie:workflow:0.2" name="sample-wf">
+        <start to="pig" />
+
+        <action name="pig">
+                <pig>
+                        <job-tracker>${jobTracker}</job-tracker>
+                        <name-node>${nameNode}</name-node>
+                        <prepare>
+                                <delete path="${output}"/>
+                        </prepare>
+                        <configuration>
+                                <property>
+                                        <name>mapred.job.queue.name</name>
+                                        <value>${queueName}</value>
+                                </property>
+                                <property>
+                                        <name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name>
+                                        <value>true</value>
+                                </property>
+                        </configuration>
+                        <script>${nameNode}/projects/bootcamp/workflow/script.pig</script>
+                        <param>input=${input}</param>
+                        <param>output=${output}</param>
+                        <file>lib/dependent.jar</file>
+                </pig>
+                <ok to="concatenator" />
+                <error to="fail" />
+        </action>
+
+        <action name="concatenator">
+                <java>
+                        <job-tracker>${jobTracker}</job-tracker>
+                        <name-node>${nameNode}</name-node>
+                        <prepare>
+                                <delete path="${nameNode}/projects/bootcamp/concat/data-${fileTimestamp}.csv"/>
+                        </prepare>
+                        <configuration>
+                                <property>
+                                        <name>mapred.job.queue.name</name>
+                                        <value>${queueName}</value>
+                                </property>
+                        </configuration>
+                        <main-class>com.wf.Concatenator</main-class>
+                        <arg>${output}</arg>
+                        <arg>${nameNode}/projects/bootcamp/concat/data-${fileTimestamp}.csv</arg>
+                </java>
+                <ok to="fileupload" />
+                <error to="fail"/>
+        </action>
+                        
+        <action name="fileupload">
+                <ssh>
+                        <host>localhost</host>
+                        <command>/tmp/fileupload.sh</command>
+                        <args>${nameNode}/projects/bootcamp/concat/data-${fileTimestamp}.csv</args>
+                        <args>${wf:conf("ssh.host")}</args>
+                        <capture-output/>
+                </ssh>
+                <ok to="fileUploadDecision" />
+                <error to="fail"/>
+        </action>
+
+        <decision name="fileUploadDecision">
+                <switch>
+                        <case to="end">
+                                ${wf:actionData('fileupload')['output'] == '0'}
+                        </case>
+                        <default to="fail"/>
+                </switch>
+        </decision>
+
+        <kill name="fail">
+                <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
+        </kill>
+
+        <end name="end" />
+</workflow-app>
+</verbatim>
+
+---++++ File Upload Script
+The script gets the file from hadoop, rsyncs the file to /tmp on remote host and deletes
the file from hadoop
+<verbatim>
+#!/bin/bash
+
+trap 'echo "output=$?"; exit $?' ERR INT TERM
+
+echo "Arguments: $@"
+SRCFILE=$1
+DESTHOST=$3
+
+FILENAME=`basename $SRCFILE`
+rm -f /tmp/$FILENAME
+hadoop fs -copyToLocal $SRCFILE /tmp/
+echo "Copied $SRCFILE to /tmp"
+
+rsync -ztv --rsh=ssh --stats /tmp/$FILENAME $DESTHOST:/tmp
+echo "rsynced $FILENAME to $DESTUSER@$DESTHOST:$DESTFILE"
+
+hadoop fs -rmr $SRCFILE
+echo "Deleted $SRCFILE"
+
+rm -f /tmp/$FILENAME
+echo "output=0"
+</verbatim>

Added: incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/index.twiki
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/index.twiki?rev=1515884&view=auto
==============================================================================
--- incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/index.twiki (added)
+++ incubator/falcon/trunk/releases/0.3-incubating/src/site/twiki/index.twiki Tue Aug 20 17:04:54
2013
@@ -0,0 +1,11 @@
+---+++ Contents
+
+   * <a href="./docs/InstallationSteps.html">Simple setup</a>
+
+   * <a href="./docs/FalconArchitecture.html">Overview</a>
+
+   * <a href="./docs/OnBoarding.html">On boarding</a>
+
+   * <a href="./docs/EntitySpecification.html">Entity specification</a>
+
+   * <a href="./docs/FalconCLI.html">CLI</a>

Added: incubator/falcon/trunk/releases/pom.xml
URL: http://svn.apache.org/viewvc/incubator/falcon/trunk/releases/pom.xml?rev=1515884&view=auto
==============================================================================
--- incubator/falcon/trunk/releases/pom.xml (added)
+++ incubator/falcon/trunk/releases/pom.xml Tue Aug 20 17:04:54 2013
@@ -0,0 +1,37 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+  
+       http://www.apache.org/licenses/LICENSE-2.0
+  
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
+    <modelVersion>4.0.0</modelVersion>
+    <parent>
+        <groupId>org.apache.falcon</groupId>
+        <artifactId>falcon-website</artifactId>
+        <version>0.4-SNAPSHOT</version>
+    </parent>
+    <artifactId>falcon-website-releases</artifactId>
+    <version>0.1</version>
+    <packaging>pom</packaging>
+
+    <name>Apache Falcon - Release documentation</name>
+
+    <modules>
+        <module>0.3-incubating</module>
+    </modules>
+
+</project>



Mime
View raw message