falcon-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pall...@apache.org
Subject svn commit: r1730449 [1/3] - in /falcon/trunk: ./ general/ general/src/site/ general/src/site/twiki/ general/src/site/twiki/falconcli/ general/src/site/twiki/restapi/ releases/
Date Mon, 15 Feb 2016 05:48:01 GMT
Author: pallavi
Date: Mon Feb 15 05:48:00 2016
New Revision: 1730449

URL: http://svn.apache.org/viewvc?rev=1730449&view=rev
Log:
Updating docs under trunk for 0.9 release

Added:
    falcon/trunk/general/src/site/twiki/Configuration.twiki
    falcon/trunk/general/src/site/twiki/Distributed-mode.twiki
    falcon/trunk/general/src/site/twiki/Embedded-mode.twiki
    falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki
    falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki
    falcon/trunk/general/src/site/twiki/HDFSDR.twiki
    falcon/trunk/general/src/site/twiki/HiveDR.twiki
    falcon/trunk/general/src/site/twiki/ImportExport.twiki
    falcon/trunk/general/src/site/twiki/falconcli/
    falcon/trunk/general/src/site/twiki/falconcli/CommonCLI.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ContinueInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Definition.twiki
    falcon/trunk/general/src/site/twiki/falconcli/DeleteEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/DependencyEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/DependencyInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/EdgeMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/FalconCLI.twiki
    falcon/trunk/general/src/site/twiki/falconcli/FeedInstanceListing.twiki
    falcon/trunk/general/src/site/twiki/falconcli/HelpAdmin.twiki
    falcon/trunk/general/src/site/twiki/falconcli/KillInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/LifeCycleInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/LineageMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ListEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ListInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ListMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/LogsInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Lookup.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ParamsInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/RelationMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/RerunInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ResumeEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/ResumeInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/RunningInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SLAAlert.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Schedule.twiki
    falcon/trunk/general/src/site/twiki/falconcli/StatusAdmin.twiki
    falcon/trunk/general/src/site/twiki/falconcli/StatusEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/StatusInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Submit.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SubmitRecipe.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SummaryEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SummaryInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SuspendEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/SuspendInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/Touch.twiki
    falcon/trunk/general/src/site/twiki/falconcli/TriageInstance.twiki
    falcon/trunk/general/src/site/twiki/falconcli/UpdateEntity.twiki
    falcon/trunk/general/src/site/twiki/falconcli/VersionAdmin.twiki
    falcon/trunk/general/src/site/twiki/falconcli/VertexEdgesMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/VertexMetadata.twiki
    falcon/trunk/general/src/site/twiki/falconcli/VerticesMetadata.twiki
    falcon/trunk/general/src/site/twiki/restapi/FeedSLA.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceDependencies.twiki
    falcon/trunk/general/src/site/twiki/restapi/Triage.twiki
Modified:
    falcon/trunk/general/pom.xml
    falcon/trunk/general/src/site/site.xml
    falcon/trunk/general/src/site/twiki/EntitySpecification.twiki
    falcon/trunk/general/src/site/twiki/FalconCLI.twiki
    falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki
    falcon/trunk/general/src/site/twiki/InstallationSteps.twiki
    falcon/trunk/general/src/site/twiki/OnBoarding.twiki
    falcon/trunk/general/src/site/twiki/Operability.twiki
    falcon/trunk/general/src/site/twiki/Recipes.twiki
    falcon/trunk/general/src/site/twiki/Security.twiki
    falcon/trunk/general/src/site/twiki/index.twiki
    falcon/trunk/general/src/site/twiki/restapi/AdjacentVertices.twiki
    falcon/trunk/general/src/site/twiki/restapi/AdminStack.twiki
    falcon/trunk/general/src/site/twiki/restapi/AdminVersion.twiki
    falcon/trunk/general/src/site/twiki/restapi/AllEdges.twiki
    falcon/trunk/general/src/site/twiki/restapi/AllVertices.twiki
    falcon/trunk/general/src/site/twiki/restapi/Edge.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityDefinition.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityDelete.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityDependencies.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityLineage.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityList.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityResume.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySchedule.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityStatus.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySubmit.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySubmitAndSchedule.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySummary.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntitySuspend.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityTouch.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityUpdate.twiki
    falcon/trunk/general/src/site/twiki/restapi/EntityValidate.twiki
    falcon/trunk/general/src/site/twiki/restapi/FeedInstanceListing.twiki
    falcon/trunk/general/src/site/twiki/restapi/FeedLookup.twiki
    falcon/trunk/general/src/site/twiki/restapi/Graph.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceKill.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceList.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceLogs.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceParams.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceRerun.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceResume.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceRunning.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceStatus.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceSummary.twiki
    falcon/trunk/general/src/site/twiki/restapi/InstanceSuspend.twiki
    falcon/trunk/general/src/site/twiki/restapi/MetadataList.twiki
    falcon/trunk/general/src/site/twiki/restapi/MetadataRelations.twiki
    falcon/trunk/general/src/site/twiki/restapi/ResourceList.twiki
    falcon/trunk/general/src/site/twiki/restapi/Vertex.twiki
    falcon/trunk/general/src/site/twiki/restapi/VertexProperties.twiki
    falcon/trunk/general/src/site/twiki/restapi/Vertices.twiki
    falcon/trunk/pom.xml
    falcon/trunk/releases/pom.xml

Modified: falcon/trunk/general/pom.xml
URL: http://svn.apache.org/viewvc/falcon/trunk/general/pom.xml?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/pom.xml (original)
+++ falcon/trunk/general/pom.xml Mon Feb 15 05:48:00 2016
@@ -22,10 +22,10 @@
     <parent>
         <groupId>org.apache.falcon</groupId>
         <artifactId>falcon-website</artifactId>
-        <version>0.8-SNAPSHOT</version>
+        <version>0.9-SNAPSHOT</version>
     </parent>
     <artifactId>falcon-website-general</artifactId>
-    <version>0.8-SNAPSHOT</version>
+    <version>0.9-SNAPSHOT</version>
     <packaging>war</packaging>
 
     <name>Apache Falcon - General</name>

Modified: falcon/trunk/general/src/site/site.xml
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/site.xml?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/src/site/site.xml (original)
+++ falcon/trunk/general/src/site/site.xml Mon Feb 15 05:48:00 2016
@@ -148,6 +148,7 @@
 
         <menu name="Documentation">
             <!-- current points to latest release -->
+            <item name="0.9 (Current)" href="./0.9/index.html"/>
             <item name="0.8" href="./0.8/index.html"/>
             <item name="0.7" href="./0.7/index.html"/>
             <item name="0.6.1" href="./0.6.1/index.html"/>

Added: falcon/trunk/general/src/site/twiki/Configuration.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/Configuration.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/Configuration.twiki (added)
+++ falcon/trunk/general/src/site/twiki/Configuration.twiki Mon Feb 15 05:48:00 2016
@@ -0,0 +1,122 @@
+---+Configuring Falcon
+
+By default config directory used by falcon is {package dir}/conf. To override this (to use the same conf with multiple
+falcon upgrades), set environment variable FALCON_CONF to the path of the conf dir.
+
+falcon-env.sh has been added to the falcon conf. This file can be used to set various environment variables that you
+need for you services.
+In addition you can set any other environment variables you might need. This file will be sourced by falcon scripts
+before any commands are executed. The following environment variables are available to set.
+
+<verbatim>
+# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path
+#export JAVA_HOME=
+
+# any additional java opts you want to set. This will apply to both client and server operations
+#export FALCON_OPTS=
+
+# any additional java opts that you want to set for client only
+#export FALCON_CLIENT_OPTS=
+
+# java heap size we want to set for the client. Default is 1024MB
+#export FALCON_CLIENT_HEAP=
+
+# any additional opts you want to set for prism service.
+#export FALCON_PRISM_OPTS=
+
+# java heap size we want to set for the prism service. Default is 1024MB
+#export FALCON_PRISM_HEAP=
+
+# any additional opts you want to set for falcon service.
+#export FALCON_SERVER_OPTS=
+
+# java heap size we want to set for the falcon server. Default is 1024MB
+#export FALCON_SERVER_HEAP=
+
+# What is is considered as falcon home dir. Default is the base location of the installed software
+#export FALCON_HOME_DIR=
+
+# Where log files are stored. Default is logs directory under the base install location
+#export FALCON_LOG_DIR=
+
+# Where pid files are stored. Default is logs directory under the base install location
+#export FALCON_PID_DIR=
+
+# where the falcon active mq data is stored. Default is logs/data directory under the base install location
+#export FALCON_DATA_DIR=
+
+# Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.
+#export FALCON_EXPANDED_WEBAPP_DIR=
+</verbatim>
+
+---++Advanced Configurations
+
+---+++Configuring Monitoring plugin to register catalog partitions
+Falcon comes with a monitoring plugin that registers catalog partition. This comes in really handy during migration from
+ filesystem based feeds to hcatalog based feeds.
+This plugin enables the user to de-couple the partition registration and assume that all partitions are already on
+hcatalog even before the migration, simplifying the hcatalog migration.
+
+By default this plugin is disabled.
+To enable this plugin and leverage the feature, there are 3 pre-requisites:
+<verbatim>
+In {package dir}/conf/startup.properties, add
+*.workflow.execution.listeners=org.apache.falcon.catalog.CatalogPartitionHandler
+
+In the cluster definition, ensure registry endpoint is defined.
+Ex:
+<interface type="registry" endpoint="thrift://localhost:1109" version="0.13.3"/>
+
+In the feed definition, ensure the corresponding catalog table is mentioned in feed-properties
+Ex:
+<properties>
+    <property name="catalog.table" value="catalog:default:in_table#year={YEAR};month={MONTH};day={DAY};hour={HOUR};
+    minute={MINUTE}"/>
+</properties>
+</verbatim>
+
+*NOTE : for Mac OS users*
+<verbatim>
+If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS (explained above).
+
+In  {package dir}/conf/falcon-env.sh uncomment the following line
+#export FALCON_SERVER_OPTS=
+
+and change it to look as below
+export FALCON_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
+</verbatim>
+
+---+++Activemq
+
+* falcon server starts embedded active mq. To control this behaviour, set the following system properties using -D
+option in environment variable FALCON_OPTS:
+   * falcon.embeddedmq=<true/false> - Should server start embedded active mq, default true
+   * falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616
+   * falcon.embeddedmq.data=<path> - Data path for embedded active mq, default {package dir}/logs/data
+
+---+++Falcon System Notifications
+Some Falcon features such as late data handling, retries, metadata service, depend on JMS notifications sent when the Oozie workflow completes. These system notifications are sent as part of Falcon Post Processing action. Given that the post processing action is also a job, it is prone to failures and in case of failures, Falcon is blind to the status of the workflow. To alleviate this problem and make the notifications more reliable, you can enable Oozie's JMS notification feature and disable Falcon post-processing notification by making the following changes:
+   * In Falcon runtime.properties, set *.falcon.jms.notification.enabled to false. This will turn off JMS notification in post-processing.
+   * Copy notification related properties in oozie/conf/oozie-site.xml to oozie-site.xml of the Oozie installation.  Restart Oozie so changes get reflected.  
+
+*NOTE : If you disable Falcon post-processing JMS notification and not enable Oozie JMS notification, features such as failure retry, late data handling and metadata service will be disabled for all entities on the server.*
+
+---+++Enabling Falcon Native Scheudler
+You can either choose to schedule entities using Oozie's coordinator or using Falcon's native scheduler. To be able to schedule entities natively on Falcon, you will need to add some additional properties to <verbatim>$FALCON_HOME/conf/startup.properties</verbatim> before starting the Falcon Server. For details on the same, refer to [[FalconNativeScheduler][Falcon Native Scheduler]]
+
+---+++Adding Extension Libraries
+
+Library extensions allows users to add custom libraries to entity lifecycles such as feed retention, feed replication
+and process execution. This is useful for usecases such as adding filesystem extensions. To enable this, add the
+following configs to startup.properties:
+*.libext.paths=<paths to be added to all entity lifecycles>
+
+*.libext.feed.paths=<paths to be added to all feed lifecycles>
+
+*.libext.feed.retentions.paths=<paths to be added to feed retention workflow>
+
+*.libext.feed.replication.paths=<paths to be added to feed replication workflow>
+
+*.libext.process.paths=<paths to be added to process workflow>
+
+The configured jars are added to falcon classpath and the corresponding workflows.

Added: falcon/trunk/general/src/site/twiki/Distributed-mode.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/Distributed-mode.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/Distributed-mode.twiki (added)
+++ falcon/trunk/general/src/site/twiki/Distributed-mode.twiki Mon Feb 15 05:48:00 2016
@@ -0,0 +1,198 @@
+---+Distributed Mode
+
+
+Following are the steps needed to package and deploy Falcon in Embedded Mode. You need to complete Steps 1-3 mentioned
+ [[InstallationSteps][here]] before proceeding further.
+
+---++Package Falcon
+Ensure that you are in the base directory (where you cloned Falcon). Let’s call it {project dir}
+
+<verbatim>
+$mvn clean assembly:assembly -DskipTests -DskipCheck=true -Pdistributed,hadoop-2
+</verbatim>
+
+
+<verbatim>
+$ls {project dir}/target/
+</verbatim>
+
+It should give an output like below :
+<verbatim>
+apache-falcon-distributed-${project.version}-server.tar.gz
+apache-falcon-distributed-${project.version}-sources.tar.gz
+archive-tmp
+maven-shared-archive-resources
+</verbatim>
+
+   * apache-falcon-distributed-${project.version}-sources.tar.gz contains source files of Falcon repo.
+
+   * apache-falcon-distributed-${project.version}-server.tar.gz package contains project artifacts along with it's
+dependencies, configuration files and scripts required to deploy Falcon.
+
+
+Tar can be found in {project dir}/target/apache-falcon-distributed-${project.version}-server.tar.gz . This is the tar
+used for installing Falcon. Lets call it {falcon package}
+
+Tar is structured as follows.
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+   |- falcon-status
+   |- falcon-config.sh
+   |- service-start.sh
+   |- service-stop.sh
+   |- service-status.sh
+   |- prism-stop
+   |- prism-start
+   |- prism-status
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- client.properties
+   |- prism.keystore
+   |- log4j.xml
+   |- falcon-env.sh
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- falcon.war
+      |- prism.war
+|- oozie
+   |- conf
+   |- libext
+|- hadooplibs
+|- README
+|- NOTICE.txt
+|- LICENSE.txt
+|- DISCLAIMER.txt
+|- CHANGES.txt
+</verbatim>
+
+
+---++Installing & running Falcon
+
+---+++Installing Falcon
+
+Running Falcon in distributed mode requires bringing up both prism and server.As the name suggests Falcon prism splits
+the request it gets to the Falcon servers. It is a good practice to start prism and server with their corresponding
+configurations separately. Create separate directory for prism and server. Let's call them {falcon-prism-dir} and
+{falcon-server-dir} respectively.
+
+*For prism*
+<verbatim>
+$mkdir {falcon-prism-dir}
+$tar -xzvf {falcon package}
+</verbatim>
+
+*For server*
+<verbatim>
+$mkdir {falcon-server-dir}
+$tar -xzvf {falcon package}
+</verbatim>
+
+
+---+++Starting Prism
+
+<verbatim>
+cd {falcon-prism-dir}/falcon-distributed-${project.version}
+bin/prism-start [-port <port>]
+</verbatim>
+
+By default,
+* prism server starts at port 16443. To change the port, use -port option
+
+* falcon.enableTLS can be set to true or false explicitly to enable SSL, if not port that end with 443 will
+automatically put prism on https://
+
+* prism starts with conf from {falcon-prism-dir}/falcon-distributed-${project.version}/conf. To override this (to use
+the same conf with multiple prism upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find
+the instructions for configuring Falcon [[Configuration][here]].
+
+*Enabling prism-client*
+*If prism is not started using default-port 16443 then edit the following property in
+{falcon-prism-dir}/falcon-distributed-${project.version}/conf/client.properties
+falcon.url=http://{machine-ip}:{prism-port}/
+
+
+---+++Starting Falcon Server
+
+<verbatim>
+$cd {falcon-server-dir}/falcon-distributed-${project.version}
+$bin/falcon-start [-port <port>]
+</verbatim>
+
+By default,
+* If falcon.enableTLS is set to true explicitly or not set at all, Falcon starts at port 15443 on https:// by default.
+
+* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 on http://.
+
+* To change the port, use -port option.
+
+* If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put Falcon on https://. Any
+other port will put Falcon on http://.
+
+* server starts with conf from {falcon-server-dir}/falcon-distributed-${project.version}/conf. To override this (to use
+the same conf with multiple server upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find
+ the instructions for configuring Falcon [[Configuration][here]].
+
+*Enabling server-client*
+*If server is not started using default-port 15443 then edit the following property in
+{falcon-server-dir}/falcon-distributed-${project.version}/conf/client.properties. You can find the instructions for
+configuring Falcon here.
+falcon.url=http://{machine-ip}:{server-port}/
+
+*NOTE* : https is the secure version of HTTP, the protocol over which data is sent between your browser and the website
+that you are connected to. By default Falcon runs in https mode. But user can configure it to http.
+
+
+---+++Using Falcon
+
+<verbatim>
+$cd {falcon-prism-dir}/falcon-distributed-${project.version}
+$bin/falcon admin -version
+Falcon server build version: {Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",
+Mode:"embedded"}
+
+$bin/falcon help
+(for more details about Falcon cli usage)
+</verbatim>
+
+
+---+++Dashboard
+
+Once Falcon / prism is started, you can view the status of Falcon entities using the Web-based dashboard. You can open
+your browser at the corresponding port to use the web UI.
+
+Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not exist on your Falcon and
+Oozie servers, please create the user.
+
+<verbatim>
+## create user.
+[root@falconhost ~] useradd -U -m falcon-dashboard -G users
+
+## verify user is created with membership in correct groups.
+[root@falconhost ~] groups falcon-dashboard
+falcon-dashboard : falcon-dashboard users
+[root@falconhost ~]
+</verbatim>
+
+
+---+++Stopping Falcon Server
+
+<verbatim>
+$cd {falcon-server-dir}/falcon-distributed-${project.version}
+$bin/falcon-stop
+</verbatim>
+
+---+++Stopping Falcon Prism
+
+<verbatim>
+$cd {falcon-prism-dir}/falcon-distributed-${project.version}
+$bin/prism-stop
+</verbatim>

Added: falcon/trunk/general/src/site/twiki/Embedded-mode.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/Embedded-mode.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/Embedded-mode.twiki (added)
+++ falcon/trunk/general/src/site/twiki/Embedded-mode.twiki Mon Feb 15 05:48:00 2016
@@ -0,0 +1,198 @@
+---+Embedded Mode
+
+Following are the steps needed to package and deploy Falcon in Embedded Mode. You need to complete Steps 1-3 mentioned
+ [[InstallationSteps][here]] before proceeding further.
+
+---++Package Falcon
+Ensure that you are in the base directory (where you cloned Falcon). Let’s call it {project dir}
+
+<verbatim>
+$mvn clean assembly:assembly -DskipTests -DskipCheck=true
+</verbatim>
+
+<verbatim>
+$ls {project dir}/target/
+</verbatim>
+It should give an output like below :
+<verbatim>
+apache-falcon-${project.version}-bin.tar.gz
+apache-falcon-${project.version}-sources.tar.gz
+archive-tmp
+maven-shared-archive-resources
+</verbatim>
+
+* apache-falcon-${project.version}-sources.tar.gz contains source files of Falcon repo.
+
+* apache-falcon-${project.version}-bin.tar.gz package contains project artifacts along with it's dependencies,
+configuration files and scripts required to deploy Falcon.
+
+Tar can be found in {project dir}/target/apache-falcon-${project.version}-bin.tar.gz
+
+Tar is structured as follows :
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+   |- falcon-status
+   |- falcon-config.sh
+   |- service-start.sh
+   |- service-stop.sh
+   |- service-status.sh
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- prism.keystore
+   |- client.properties
+   |- log4j.xml
+   |- falcon-env.sh
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- falcon.war
+|- data
+   |- falcon-store
+   |- graphdb
+   |- localhost
+|- examples
+   |- app
+      |- hive
+      |- oozie-mr
+      |- pig
+   |- data
+   |- entity
+      |- filesystem
+      |- hcat
+|- oozie
+   |- conf
+   |- libext
+|- logs
+|- hadooplibs
+|- README
+|- NOTICE.txt
+|- LICENSE.txt
+|- DISCLAIMER.txt
+|- CHANGES.txt
+</verbatim>
+
+
+---++Installing & running Falcon
+
+Running Falcon in embedded mode requires bringing up server.
+
+<verbatim>
+$tar -xzvf {falcon package}
+$cd falcon-${project.version}
+</verbatim>
+
+
+---+++Starting Falcon Server
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-start [-port <port>]
+</verbatim>
+
+By default,
+* If falcon.enableTLS is set to true explicitly or not set at all, Falcon starts at port 15443 on https:// by default.
+
+* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 on http://.
+
+* To change the port, use -port option.
+
+* If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put Falcon on https://. Any
+other port will put Falcon on http://.
+
+* Server starts with conf from {falcon-server-dir}/falcon-distributed-${project.version}/conf. To override this (to use
+the same conf with multiple server upgrades), set environment variable FALCON_CONF to the path of conf dir. You can find
+ the instructions for configuring Falcon [[Configuration][here]].
+
+
+---+++Enabling server-client
+If server is not started using default-port 15443 then edit the following property in
+{falcon-server-dir}/falcon-${project.version}/conf/client.properties
+
+falcon.url=http://{machine-ip}:{server-port}/
+
+
+---+++Using Falcon
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon admin -version
+Falcon server build version: {Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",Mode:
+"embedded",Hadoop:"${hadoop.version}"}
+
+$bin/falcon help
+(for more details about Falcon cli usage)
+</verbatim>
+
+*Note* : https is the secure version of HTTP, the protocol over which data is sent between your browser and the website
+that you are connected to. By default Falcon runs in https mode. But user can configure it to http.
+
+
+---+++Dashboard
+
+Once Falcon server is started, you can view the status of Falcon entities using the Web-based dashboard. You can open
+your browser at the corresponding port to use the web UI.
+
+Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not exist on your Falcon and
+Oozie servers, please create the user.
+
+<verbatim>
+## create user.
+[root@falconhost ~] useradd -U -m falcon-dashboard -G users
+
+## verify user is created with membership in correct groups.
+[root@falconhost ~] groups falcon-dashboard
+falcon-dashboard : falcon-dashboard users
+[root@falconhost ~]
+</verbatim>
+
+
+---++Running Examples using embedded package
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-start
+</verbatim>
+Make sure the Hadoop and Oozie endpoints are according to your setup in
+examples/entity/filesystem/standalone-cluster.xml
+The cluster locations,staging and working dirs, MUST be created prior to submitting a cluster entity to Falcon.
+*staging* must have 777 permissions and the parent dirs must have execute permissions
+*working* must have 755 permissions and the parent dirs must have execute permissions
+<verbatim>
+$bin/falcon entity -submit -type cluster -file examples/entity/filesystem/standalone-cluster.xml
+</verbatim>
+Submit input and output feeds:
+<verbatim>
+$bin/falcon entity -submit -type feed -file examples/entity/filesystem/in-feed.xml
+$bin/falcon entity -submit -type feed -file examples/entity/filesystem/out-feed.xml
+</verbatim>
+Set-up workflow for the process:
+<verbatim>
+$hadoop fs -put examples/app /
+</verbatim>
+Submit and schedule the process:
+<verbatim>
+$bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/oozie-mr-process.xml
+$bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/pig-process.xml
+</verbatim>
+Generate input data:
+<verbatim>
+$examples/data/generate.sh <<hdfs endpoint>>
+</verbatim>
+Get status of instances:
+<verbatim>
+$bin/falcon instance -status -type process -name oozie-mr-process -start 2013-11-15T00:05Z -end 2013-11-15T01:00Z
+</verbatim>
+
+HCat based example entities are in examples/entity/hcat.
+
+
+---+++Stopping Falcon Server
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-stop
+</verbatim>

Modified: falcon/trunk/general/src/site/twiki/EntitySpecification.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/EntitySpecification.twiki?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/src/site/twiki/EntitySpecification.twiki (original)
+++ falcon/trunk/general/src/site/twiki/EntitySpecification.twiki Mon Feb 15 05:48:00 2016
@@ -70,7 +70,7 @@ Path is the hdfs path for each location.
 Falcon would use the location to do intermediate processing of entities in hdfs and hence Falcon
 should have read/write/execute permission on these locations.
 These locations MUST be created prior to submitting a cluster entity to Falcon.
-*staging* should have atleast 755 permissions and is a mandatory location .The parent dirs must have execute permissions so multiple
+*staging* should have 777 permissions and is a mandatory location .The parent dirs must have execute permissions so multiple
 users can write to this location. *working* must have 755 permissions and is a optional location.
 If *working* is not specified, falcon creates a sub directory in the *staging* location with 755 perms.
 The parent dir for *working* must have execute permissions so multiple
@@ -98,6 +98,61 @@ A key-value pair, which are propagated t
 Ideally JMS impl class name of messaging engine (brokerImplClass) 
 should be defined here.
 
+---++ Datasource Specification
+
+The datasource entity contains connection information required to connect to a data source like MySQL database.
+The datasource XSD specification is available here:
+A datasource contains read and write interfaces which are used by Falcon to import or export data from or to
+datasources respectively. A datasource is referenced by feeds which are on-boarded to Falcon by its name.
+
+Following are the tags defined in a datasource.xml:
+
+<verbatim>
+<datasource colo="west-coast" description="Customer database on west coast" type="mysql"
+ name="test-hsql-db" xmlns="uri:falcon:datasource:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+</verbatim>
+
+The colo specifies the colo to which the datasource belongs to and name is the name of the datasource which has to
+be unique.
+
+---+++ Interfaces
+
+A datasource has two interfaces as described below:
+<verbatim>
+    <interface type="readonly" endpoint="jdbc:hsqldb:localhost/db"/>
+</verbatim>
+
+A readonly interface specifies the endpoint and protocol to connect to a datasource.
+This would be used in the context of import from datasource into HDFS.
+
+<verbatim>
+<interface type="write" endpoint="jdbc:hsqldb:localhost/db1">
+</verbatim>
+
+A write interface specifies the endpoint and protocol to to write to the datasource.
+Falcon uses this interface to export data from hdfs to datasource.
+
+<verbatim>
+<credential type="password-text">
+    <userName>SA</userName>
+    <passwordText></passwordText>
+</credential>
+</verbatim>
+
+
+A credential is associated with an interface (read or write) providing user name and password to authenticate
+to the datasource.
+
+<verbatim>
+<credential type="password-text">
+     <userName>SA</userName>
+     <passwordFile>hdfs-file-path</passwordText>
+</credential>
+</verbatim>
+
+The credential can be specified via a password file present in the HDFS. This file should only be accessible by
+the user.
+
 ---++ Feed Specification
 The Feed XSD specification is available here.
 A Feed defines various attributes of feed like feed location, frequency, late-arrival handling and retention policies.
@@ -244,6 +299,35 @@ expressions like frequency. slaLow is in
 availability SLAs. slaHigh is intended to serve for reporting the feeds which missed their SLAs. SLAs are relative to
 feed instance time.
 
+---+++ Import
+
+<verbatim>
+<import>
+    <source name="test-hsql-db" tableName="customer">
+        <extract type="full">
+            <mergepolicy>snapshot</mergepolicy>
+         </extract>
+         <fields>
+            <includes>
+                <field>id</field>
+                <field>name</field>
+            </includes>
+         </fields>
+    </source>
+    <arguments>
+        <argument name="--split-by" value="id"/>
+        <argument name="--num-mappers" value="2"/>
+    </arguments>
+</import>
+
+A feed can have an import policy associated with it. The souce name specified the datasource reference to the
+datasource entity from which the data will be imported to HDFS. The tableName spcified the table or topic to be
+imported from the datasource. The extract type specifies the pull mechanism (full or
+incremental extract). Full extract method extracts all the data from the datasource. The incremental extraction
+method feature implementation is in progress. The mergeplocy determines how the data is to be layed out on HDFS.
+The snapshot layout creates a snapshot of the data on HDFS using the feed's location specification. Fields is used
+to specify the projection columns. Feed import from database underneath uses sqoop to achieve the task. Any advanced
+Sqoop options can be specified via the arguments.
 
 ---+++ Late Arrival
 
@@ -256,6 +340,18 @@ upto 8 hours then late-arrival's cut-off
 
 *Note:* This will only apply for !FileSystem storage but not Table storage until a future time.
 
+
+---+++ Email Notification
+
+<verbatim>
+    <notification type="email" to="bob@xyz.com"/>
+</verbatim>
+Specifying the notification element with "type" property allows users to receive email notification when a scheduled feed instance completes.
+Multiple recipients of an email can be provided as comma separated addresses with "to" property.
+To send email notification ensure that SMTP parameters are defined in Falcon startup.properties.
+Refer to [[FalconEmailNotification][Falcon Email Notification]] for more details.
+
+
 ---+++ ACL
 
 A feed has ACL (Access Control List) useful for implementing permission requirements
@@ -280,6 +376,13 @@ permission indicates the permission.
         <property name="parallel" value="3"/>
         <property name="maxMaps" value="8"/>
         <property name="mapBandwidth" value="1"/>
+        <property name="overwrite" value="true"/>
+        <property name="ignoreErrors" value="false"/>
+        <property name="skipChecksum" value="false"/>
+        <property name="removeDeletedFiles" value="true"/>
+        <property name="preserveBlockSize" value="true"/>
+        <property name="preserveReplicationNumber" value="true"/>
+        <property name="preservePermission" value="true"/>
         <property name="order" value="LIFO"/>
     </properties>
 </verbatim>
@@ -288,9 +391,59 @@ available to user to specify the Hadoop
 "timeout", "parallel" and "order" are other special properties which decides replication instance's timeout value while
 waiting for the feed instance, parallel decides the concurrent replication instances that can run at any given time and
 order decides the execution order for replication instances like FIFO, LIFO and LAST_ONLY.
-"maxMaps" represents the maximum number of maps used during replication. "mapBandwidth" represents the bandwidth in MB/s
-used by each mapper during replication.
- 
+DistCp options can be passed as custom properties, which will be propagated to the DistCp tool. "maxMaps" represents
+the maximum number of maps used during replication. "mapBandwidth" represents the bandwidth in MB/s
+used by each mapper during replication. "overwrite" represents overwrite destination during replication.
+"ignoreErrors" represents ignore failures not causing the job to fail during replication. "skipChecksum" represents
+bypassing checksum verification during replication. "removeDeletedFiles" represents deleting the files existing in the
+destination but not in source during replication. "preserveBlockSize" represents preserving block size during
+replication. "preserveReplicationNumber" represents preserving replication number during replication.
+"preservePermission" represents preserving permission during
+
+
+---+++ Lifecycle
+<verbatim>
+
+<lifecycle>
+    <retention-stage>
+        <frequency>hours(10)</frequency>
+        <queue>reports</queue>
+        <priority>NORMAL</priority>
+        <properties>
+            <property name="retention.policy.agebaseddelete.limit" value="hours(9)"></property>
+        </properties>
+    </retention-stage>
+</lifecycle>
+
+</verbatim>
+
+lifecycle tag is the new way to define various stages of a feed's lifecycle. In the example above we have defined a
+retention-stage using lifecycle tag. You may define lifecycle at global level or a cluster level or both. Cluster level
+configuration takes precedence and falcon falls back to global definition if cluster level specification is missing.
+
+
+----++++ Retention Stage
+As of now there are two ways to specify retention. One is through the <retention> tag in the cluster and another is the
+new way through <retention-stage> tag in <lifecycle> tag. If both are defined for a feed, then the lifecycle tag will be
+considered effective and falcon will ignore the <retention> tag in the cluster. If there is an invalid configuration of
+retention-stage in lifecycle tag, then falcon will *NOT* fall back to retention tag even if it is defined and will
+throw validation error.
+
+In this new method of defining retention you can specify the frequency at which the retention should occur, you can
+also define the queue and priority parameters for retention jobs. The default behavior of retention-stage is same as
+the existing one which is to delete all instances corresponding to instance-time earlier than the duration provided in
+"retention.policy.agebaseddelete.limit"
+
+Property "retention.policy.agebaseddelete.limit" is a mandatory property and must contain a valid duration e.g. "hours(1)"
+Retention frequency is not a mandatory parameter. If user doesn't specify the frequency in the retention stage then
+it doesn't fallback to old retention policy frequency. Its default value is set to 6 hours if feed frequency is less
+than 6 hours else its set to feed frequency as retention shouldn't be more frequent than data availability to avoid
+wastage of compute resources.
+
+In future, we will allow more customisation like customising how to choose instances to be deleted through this method.
+
+
+
 ---++ Process Specification
 A process defines configuration for a workflow. A workflow is a directed acyclic graph(DAG) which defines the job for the workflow engine. A process definition defines  the configurations required to run the workflow job. For example, process defines the frequency at which the workflow should run, the clusters on which the workflow should run, the inputs and outputs for the workflow, how the workflow failures should be handled, how the late inputs should be handled and so on.  
 
@@ -657,10 +810,12 @@ Syntax:
 </process>
 </verbatim>
 
-queueName and jobPriority are special properties, which when present are used by the Falcon's launcher job, the same property is also available in workflow which can be used to propagate to pig or M/R job.
+The following are some special properties, which when present are used by the Falcon's launcher job, the same property is also available in workflow which can be used to propagate to pig or M/R job.
 <verbatim>
         <property name="queueName" value="hadoopQueue"/>
         <property name="jobPriority" value="VERY_HIGH"/>
+        <!-- This property is used to turn off JMS notifications for this process. JMS notifications are enabled by default. -->
+        <property name="userJMSNotificationEnabled" value="false"/>
 </verbatim>
 
 ---+++ Workflow
@@ -673,7 +828,7 @@ be in lib folder inside the workflow pat
 The properties defined in the cluster and cluster properties(nameNode and jobTracker) will also
 be available for the workflow.
 
-There are 2 engines supported today.
+There are 3 engines supported today.
 
 ---++++ Oozie
 
@@ -742,7 +897,7 @@ Feeds with Hive table storage will send
 <verbatim>$input_filter</verbatim>
 
 ---+++ Retry
-Retry policy defines how the workflow failures should be handled. Two retry policies are defined: backoff and exp-backoff(exponential backoff). Depending on the delay and number of attempts, the workflow is re-tried after specific intervals.
+Retry policy defines how the workflow failures should be handled. Three retry policies are defined: periodic, exp-backoff(exponential backoff) and final. Depending on the delay and number of attempts, the workflow is re-tried after specific intervals.
 Syntax:
 <verbatim>
 <process name="[process name]">
@@ -756,7 +911,7 @@ Examples:
 <verbatim>
 <process name="sample-process">
 ...
-    <retry policy="backoff" delay="minutes(10)" attempts="3"/>
+    <retry policy="periodic" delay="minutes(10)" attempts="3"/>
 ...
 </process>
 </verbatim>
@@ -806,6 +961,16 @@ This late handling specifies that late d
 
 *Note:* This is only supported for !FileSystem storage but not Table storage at this point.
 
+---+++ Email Notification
+
+<verbatim>
+    <notification type="email" to="bob@@xyz.com"/>
+</verbatim>
+Specifying the notification element with "type" property allows users to receive email notification when a scheduled process instance completes.
+Multiple recipients of an email can be provided as comma separated addresses with "to" property.
+To send email notification ensure that SMTP parameters are defined in Falcon startup.properties.
+Refer to [[FalconEmailNotification][Falcon Email Notification]] for more details.
+
 ---+++ ACL
 
 A process has ACL (Access Control List) useful for implementing permission requirements

Modified: falcon/trunk/general/src/site/twiki/FalconCLI.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/FalconCLI.twiki?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/src/site/twiki/FalconCLI.twiki (original)
+++ falcon/trunk/general/src/site/twiki/FalconCLI.twiki Mon Feb 15 05:48:00 2016
@@ -2,12 +2,36 @@
 
 FalconCLI is a interface between user and Falcon. It is a command line utility provided by Falcon. FalconCLI supports Entity Management, Instance Management and Admin operations.There is a set of web services that are used by FalconCLI to interact with Falcon.
 
+---++Common CLI Options
+
+---+++Falcon URL
+
+Optional -url option indicating the URL of the Falcon system to run the command against can be provided.  If not mentioned it will be picked from the system environment variable FALCON_URL. If FALCON_URL is not set then it will be picked from client.properties file. If the option is not
+provided and also not set in client.properties, Falcon CLI will fail.
+
+---+++Proxy user support
+
+The -doAs option allows the current user to impersonate other users when interacting with the Falcon system. The current user must be configured as a proxyuser in the Falcon system. The proxyuser configuration may restrict from
+which hosts a user may impersonate users, as well as users of which groups can be impersonated.
+
+<a href="./FalconDocumentation.html#Proxyuser_support">Proxyuser support described here.</a>
+
+---+++Debug Mode
+
+If you export FALCON_DEBUG=true then the Falcon CLI will output the Web Services API details used by any commands you execute. This is useful for debugging purposes to or see how the Falcon CLI works with the WS API.
+Alternately, you can specify '-debug' through the CLI arguments to get the debug statements.
+Example:
+$FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml -debug
+
 ---++Entity Management Operations
 
 ---+++Submit
 
 Submit option is used to set up entity definition.
 
+Usage:
+$FALCON_HOME/bin/falcon entity -submit -type [cluster|datasource|feed|process] -file <entity-definition.xml>
+
 Example: 
 $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml
 
@@ -20,6 +44,8 @@ Once submitted, an entity can be schedul
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [process|feed] -name <<name>> -schedule
 
+Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.
+
 Example:
 $FALCON_HOME/bin/falcon entity  -type process -name sampleProcess -schedule
 
@@ -42,22 +68,22 @@ Usage:
 Delete removes the submitted entity definition for the specified entity and put it into the archive.
 
 Usage:
-$FALCON_HOME/bin/falcon entity  -type [cluster|feed|process] -name <<name>> -delete
+$FALCON_HOME/bin/falcon entity  -type [cluster|datasource|feed|process] -name <<name>> -delete
 
 ---+++List
 
 Entities of a particular type can be listed with list sub-command.
 
 Usage:
-$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -list
+$FALCON_HOME/bin/falcon entity -list
 
-Optional Args : -fields <<field1,field2>> -filterBy <<field1:value1,field2:value2>>
--tags <<tagkey=tagvalue,tagkey=tagvalue>> -nameseq <<namesubsequence>>
+Optional Args : -fields <<field1,field2>>
+-type <<[cluster|datasource|feed|process],[cluster|datasource|feed|process]>>
+-nameseq <<namesubsequence>> -tagkeys <<tagkeyword1,tagkeyword2>>
+-filterBy <<field1:value1,field2:value2>> -tags <<tagkey=tagvalue,tagkey=tagvalue>>
 -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/EntityList.html">Optional params described here.</a>
-
-
+<a href="./Restapi/EntityList.html">Optional params described here.</a>
 
 
 ---+++Summary
@@ -71,16 +97,18 @@ Optional Args : -start "yyyy-MM-dd'T'HH:
 -filterBy <<field1:value1,field2:value2>> -tags <<tagkey=tagvalue,tagkey=tagvalue>>
 -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10 -numInstances 7
 
-<a href="./restapi/EntitySummary.html">Optional params described here.</a>
+<a href="./Restapi/EntitySummary.html">Optional params described here.</a>
 
 ---+++Update
 
-Update operation allows an already submitted/scheduled entity to be updated. Cluster update is currently
-not allowed.
+Update operation allows an already submitted/scheduled entity to be updated. Cluster and datasource updates are
+currently not allowed.
 
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -update -file <<path_to_file>>
 
+Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.
+
 Example:
 $FALCON_HOME/bin/falcon entity -type process -name HourlyReportsGenerator -update -file /process/definition.xml
 
@@ -91,26 +119,30 @@ Force Update operation allows an already
 Usage:
 $FALCON_HOME/bin/falcon entity  -type [feed|process] -name <<name>> -touch
 
+Optional Arg : -skipDryRun. When this argument is specified, Falcon skips oozie dryrun.
+
 ---+++Status
 
 Status returns the current status of the entity.
 
 Usage:
-$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -status
+$FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -status
 
 ---+++Dependency
 
-With the use of dependency option, we can list all the entities on which the specified entity is dependent. For example for a feed, dependency return the cluster name and for process it returns all the input feeds, output feeds and cluster names.
+With the use of dependency option, we can list all the entities on which the specified entity is dependent.
+For example for a feed, dependency return the cluster name and for process it returns all the input feeds,
+output feeds and cluster names.
 
 Usage:
-$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -dependency
+$FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -dependency
 
 ---+++Definition
 
 Definition option returns the entity definition submitted earlier during submit step.
 
 Usage:
-$FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -name <<name>> -definition
+$FALCON_HOME/bin/falcon entity -type [cluster|datasource|feed|process] -name <<name>> -definition
 
 
 ---+++Lookup
@@ -125,6 +157,54 @@ $FALCON_HOME/bin/falcon entity -type fee
 If you have multiple feeds with location as /data/projects/my-hourly/${YEAR}/${MONTH}/${DAY}/${HOUR} then this command will return all of them.
 
 
+---+++SLAAlert
+<verbatim>
+Since: 0.8
+</verbatim>
+
+This command lists all the feed instances which have missed sla and are still not available. If a feed instance missed
+sla but is now available, then it will not be reported in results. The purpose of this API is alerting and hence it
+ doesn't return feed instances which missed SLA but are available as they don't require any action.
+
+* Currently sla monitoring is supported only for feeds.
+
+* Option end is optional and will default to current time if missing.
+
+* Option name is optional, if provided only instances of that feed will be considered.
+
+Usage:
+
+*Example 1*
+
+*$FALCON_HOME/bin/falcon entity -type feed -start 2014-09-05T00:00Z -slaAlert  -end 2016-05-03T00:00Z -colo local*
+
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T11:59Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:00Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:01Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:02Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:03Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:04Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:05Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:06Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:07Z, tags: Missed SLA High
+name: out, type: FEED, cluster: local, instanceTime: 2015-09-26T12:08Z, tags: Missed SLA Low
+
+
+Response: default/Success!
+
+Request Id: default/216978070@qtp-830047511-4 - f5a6c129-ab42-4feb-a2bf-c3baed356248
+
+*Example 2*
+
+*$FALCON_HOME/bin/falcon entity -type feed -start 2014-09-05T00:00Z -slaAlert  -end 2016-05-03T00:00Z -colo local -name in*
+
+name: in, type: FEED, cluster: local, instanceTime: 2015-09-26T06:00Z, tags: Missed SLA High
+
+Response: default/Success!
+
+Request Id: default/1580107885@qtp-830047511-7 - f16cbc51-5070-4551-ad25-28f75e5e4cf2
+
+
 ---++Instance Management Options
 
 ---+++Kill
@@ -158,7 +238,7 @@ $FALCON_HOME/bin/falcon instance -type <
 
 Rerun option is used to rerun instances of a given process. On issuing a rerun, by default the execution resumes from the last failed node in the workflow. This option is valid only for process instances in terminal state, i.e. SUCCEEDED, KILLED or FAILED.
 If one wants to forcefully rerun the entire workflow, -force should be passed along with -rerun
-Additionally, you can also specify properties to override via a properties file.
+Additionally, you can also specify properties to override via a properties file and this will be prioritized over force option in case of contradiction.
 
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -rerun -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-force] [-file <<properties file>>]
@@ -187,7 +267,7 @@ Optional Args : -start "yyyy-MM-dd'T'HH:
 -filterBy <<field1:value1,field2:value2>> -lifecycle <<lifecycles>>
 -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/InstanceStatus.html"> Optional params described here.</a>
+<a href="./Restapi/InstanceStatus.html"> Optional params described here.</a>
 
 ---+++List
 
@@ -196,7 +276,7 @@ If the instance is in WAITING state, mis
 
 Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:
 
-{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}]
+{"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}]}
 
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -list
@@ -205,7 +285,7 @@ Optional Args : -start "yyyy-MM-dd'T'HH:
 -colo <<colo>> -lifecycle <<lifecycles>>
 -filterBy <<field1:value1,field2:value2>> -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/InstanceList.html">Optional params described here.</a>
+<a href="./Restapi/InstanceList.html">Optional params described here.</a>
 
 ---+++Summary
 
@@ -215,15 +295,16 @@ The unscheduled instances between the sp
 
 Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:
 
-{"status":"SUCCEEDED","message":"getSummary is successful", "cluster": <<name>> [{"SUCCEEDED":"1"}, {"WAITING":"1"}, {"RUNNING":"1"}]}
+{"status":"SUCCEEDED","message":"getSummary is successful", instancesSummary:[{"cluster": <<name>> "map":[{"SUCCEEDED":"1"}, {"WAITING":"1"}, {"RUNNING":"1"}]}]}
 
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -summary
 
-Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
--colo <<colo>> -lifecycle <<lifecycles>>
+Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>>
+-filterBy <<field1:value1,field2:value2>> -lifecycle <<lifecycles>>
+-orderBy field -sortOrder <<sortOrder>>
 
-<a href="./restapi/InstanceSummary.html">Optional params described here.</a>
+<a href="./Restapi/InstanceSummary.html">Optional params described here.</a>
 
 ---+++Running
 
@@ -235,7 +316,7 @@ $FALCON_HOME/bin/falcon instance -type <
 Optional Args : -colo <<colo>> -lifecycle <<lifecycles>>
 -filterBy <<field1:value1,field2:value2>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/InstanceRunning.html">Optional params described here.</a>
+<a href="./Restapi/InstanceRunning.html">Optional params described here.</a>
 
 ---+++FeedInstanceListing
 
@@ -247,7 +328,7 @@ $FALCON_HOME/bin/falcon instance -type f
 Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
 -colo <<colo>>
 
-<a href="./restapi/FeedInstanceListing.html">Optional params described here.</a>
+<a href="./Restapi/FeedInstanceListing.html">Optional params described here.</a>
 
 ---+++Logs
 
@@ -260,7 +341,7 @@ Optional Args : -start "yyyy-MM-dd'T'HH:
 -colo <<colo>> -lifecycle <<lifecycles>>
 -filterBy <<field1:value1,field2:value2>> -orderBy field -sortOrder <<sortOrder>> -offset 0 -numResults 10
 
-<a href="./restapi/InstanceLogs.html">Optional params described here.</a>
+<a href="./Restapi/InstanceLogs.html">Optional params described here.</a>
 
 ---+++LifeCycle
 
@@ -270,6 +351,14 @@ This can be used with instance managemen
 Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status -lifecycle <<lifecycletype>> -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
 
+---+++Triage
+
+Given a feed/process instance this command traces it's ancestors to find what all ancestors have failed. It's useful if
+lot of instances are failing in a pipeline as it then finds out the root cause of the pipeline being stuck.
+
+Usage:
+$FALCON_HOME/bin/falcon instance -triage -type <<feed/process>> -name <<name>> -start "yyyy-MM-dd'T'HH:mm'Z'"
+
 ---+++Params
 
 Displays the workflow params of a given instance. Where start time is considered as nominal time of that instance and end time won't be considered.
@@ -278,6 +367,41 @@ Usage:
 $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params -start "yyyy-MM-dd'T'HH:mm'Z'"
 
 
+
+---+++Dependency
+Display the dependent instances which are dependent on the given instance. For example for a given process instance it will
+list all the input feed instances(if any) and the output feed instances(if any).
+
+An example use case of this command is as follows:
+Suppose you find out that the data in a feed instance was incorrect and you need to figure out which all process instances
+consumed this feed instance so that you can reprocess them after correcting the feed instance. You can give the feed instance
+and it will tell you which process instance produced this feed and which all process instances consumed this feed.
+
+NOTE:
+1. instanceTime must be a valid instanceTime e.g. instanceTime of a feed should be in it's validity range on applicable clusters,
+ and it should be in the range of instances produced by the producer process(if any)
+
+2. For processes with inputs like latest() which vary with time the results are not guaranteed to be correct.
+
+Usage:
+$FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params -instanceTime "yyyy-MM-dd'T'HH:mm'Z'"
+
+For example:
+$FALCON_HOME/bin/falcon instance -dependency -type feed -name out -instanceTime 2014-12-15T00:00Z
+name: producer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:00Z, tags: Output
+name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:03Z, tags: Input
+name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:04Z, tags: Input
+name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:02Z, tags: Input
+name: consumer, type: PROCESS, cluster: local, instanceTime: 2014-12-15T00:05Z, tags: Input
+
+
+Response: default/Success!
+
+Request Id: default/1125035965@qtp-503156953-7 - 447be0ad-1d38-4dce-b438-20f3de69b172
+
+
+<a href="./Restapi/InstanceDependency.html">Optional params described here.</a>
+
 ---++ Metadata Lineage Options
 
 ---+++Lineage
@@ -341,7 +465,7 @@ $FALCON_HOME/bin/falcon metadata -edge -
 
 Lists of all dimensions of given type. If the user provides optional param cluster, only the dimensions related to the cluster are listed.
 Usage:
-$FALCON_HOME/bin/falcon metadata -list -type [cluster_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines]
+$FALCON_HOME/bin/falcon metadata -list -type [cluster_entity|datasource_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines|replication_metrics]
 
 Optional Args : -cluster <<cluster name>>
 
@@ -349,6 +473,17 @@ Example:
 $FALCON_HOME/bin/falcon metadata -list -type process_entity -cluster primary-cluster
 $FALCON_HOME/bin/falcon metadata -list -type tags
 
+
+To display replication metrics from recipe based replication process and from feed replication.
+Usage:
+$FALCON_HOME/bin/falcon metadata -list -type replication_metrics -process/-feed <entity name>
+Optional Args : -numResults <<value>>
+
+Example:
+$FALCON_HOME/bin/falcon metadata -list -type replication_metrics -process hdfs-replication
+$FALCON_HOME/bin/falcon metadata -list -type replication_metrics -feed fs-replication
+
+
 ---+++ Relations
 
 List all dimensions related to specified Dimension identified by dimension-type and dimension-name.

Modified: falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki?rev=1730449&r1=1730448&r2=1730449&view=diff
==============================================================================
--- falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki (original)
+++ falcon/trunk/general/src/site/twiki/FalconDocumentation.twiki Mon Feb 15 05:48:00 2016
@@ -15,7 +15,10 @@
    * <a href="#Security">Security</a>
    * <a href="#Recipes">Recipes</a>
    * <a href="#Monitoring">Monitoring</a>
+   * <a href="#Email_Notification">Email Notification</a>
    * <a href="#Backwards_Compatibility">Backwards Compatibility Instructions</a>
+   * <a href="#Proxyuser_support">Proxyuser support</a>
+   * <a href="#ImportExport">Data Import and Export</a>
 
 ---++ Architecture
 
@@ -35,6 +38,8 @@ Falcon system has picked Oozie as the de
 other schedulers. Lot of the data processing in hadoop requires scheduling to be based on both data availability
 as well as time. Oozie currently supports these capabilities off the shelf and hence the choice.
 
+While the use of Oozie works reasonably well, there are scenarios where Oozie scheduling is proving to be a limiting factor. In its current form, Falcon relies on Oozie for both scheduling and for workflow execution, due to which the scheduling is limited to time based/cron based scheduling with additional gating conditions on data availability. Also, this imposes restrictions on datasets being periodic/cyclic in nature. In order to offer better scheduling capabilities, Falcon comes with its own native scheduler. Refer to [[FalconNativeScheduler][Falcon Native Scheduler]] for details.
+
 ---+++ Control flow
 Though the actual responsibility of the workflow is with the scheduler (Oozie), Falcon remains in the
 execution path, by subscribing to messages that each of the workflow may generate. When Falcon generates a
@@ -153,7 +158,7 @@ Examples:
 
 
 ---++ Entity Management actions
-All the following operation can also be done using [[./restapi/ResourceList][Falcon's RESTful API]].
+All the following operation can also be done using [[restapi/ResourceList][Falcon's RESTful API]].
 
 ---+++ Submit
 Entity submit action allows a new cluster/feed/process to be setup within Falcon. Submitted entity is not
@@ -252,17 +257,21 @@ feed/data xml in the following manner fo
 </verbatim>
 
 The 'limit' attribute can be specified in units of minutes/hours/days/months, and a corresponding numeric value can
-be attached to it. It essentially instructs the system to retain data spanning from the current moment to the time specified
-in the attribute spanning backwards in time. Any data beyond the limit (past/future) is erased from the system.
+be attached to it. It essentially instructs the system to retain data till the time specified
+in the attribute spanning backwards in time, from now. Any data older than that is erased from the system. By default,
+Falcon runs retention jobs up to the cluster validity end time. This causes the instances created within the endTime
+and "endTime - retentionLimit" to be retained forever. If the users do not want to retain any instances of the
+feed past the cluster validity end time, user should set property "falcon.retention.keep.instances.beyond.validity"
+to false in runtime.properties.
 
 With the integration of Hive, Falcon also provides retention for tables in Hive catalog.
 
 ---+++ Example:
 If retention period is 10 hours, and the policy kicks in at time 't', the data retained by system is essentially the
-one in range [t-10h, t]. Any data before t-10h and after t is removed from the system.
+one after or equal to t-10h . Any data before t-10h is removed from the system.
 
-The 'action' attribute can attain values of DELETE/ARCHIVE. Based upon the tag value, the data eligible for removal is either
-deleted/archived.
+The 'action' attribute can attain values of DELETE/ARCHIVE. Based upon the tag value, the data eligible for removal is
+either deleted/archived.
 
 ---+++ NOTE: Falcon 0.1/0.2 releases support Delete operation only
 
@@ -319,6 +328,16 @@ replication instance delays. If the freq
 instance will run every 2 hours and replicates data with an offset of 1 hour, i.e. at 09:00 UTC, feed instance which
 is eligible for replication is 08:00; and 11:00 UTC, feed instance of 10:00 UTC is eligible and so on.
 
+If it is required to capture the feed replication metrics like TIMETAKEN, COPY, BYTESCOPIED, set the parameter "job.counter" to "true"
+in feed entity properties section. Captured metrics from instance will be populated to the GraphDB for display on UI.
+
+*Example:*
+<verbatim>
+<properties>
+        <property name="job.counter" value="true" />
+</properties>
+</verbatim>
+
 ---+++ Where is the feed path defined for File System Storage?
 
 It's defined in the feed xml within the location tag.
@@ -561,7 +580,7 @@ simple and basic. The falcon system look
 cut-off period. Then it uses a scheduled messaging framework, like the one available in Apache ActiveMQ or Java's !DelayQueue to schedule a message with a cut-off period, then after a cut-off period the message is dequeued and Falcon checks for changes in the feed data which is recorded in HDFS in latedata file by falcons "record-size" action, if it detects any changes then the workflow will be rerun with the new set of feed data.
 
 *Example:*
-The late rerun policy can be configured in the process definition.
+For a process entity, the late rerun policy can be configured in the process definition.
 Falcon supports 3 policies, periodic, exp-backoff and final.
 Delay specifies, how often the feed data should be checked for changes, also one needs to 
 explicitly set the feed names in late-input which needs to be checked for late data.
@@ -575,6 +594,16 @@ explicitly set the feed names in late-in
 *NOTE:* Feeds configured with table storage does not support late input data handling at this point. This will be
 made available in the near future.
 
+For a feed entity replication job, the default late data handling policy can be configured in the runtime.properties file.
+Since these properties are runtime.properties, they will take effect for all replication jobs completed subsequent to the change.
+<verbatim>
+  # Default configs to handle replication for late arriving feeds.
+  *.feed.late.allowed=true
+  *.feed.late.frequency=hours(3)
+  *.feed.late.policy=exp-backoff
+</verbatim>
+
+
 ---++ Idempotency
 All the operations in Falcon are Idempotent. That is if you make same request to the falcon server / prism again you will get a SUCCESSFUL return if it was SUCCESSFUL in the first attempt. For example, you submit a new process / feed and get SUCCESSFUL message return. Now if you run the same command / api request on same entity you will again get a SUCCESSFUL message. Same is true for other operations like schedule, kill, suspend and resume.
 Idempotency also by takes care of the condition when request is sent through prism and fails on one or more servers. For example prism is configured to send request to 3 servers. First user sends a request to SUBMIT a process on all 3 of them, and receives a response SUCCESSFUL from all of them. Then due to some issue one of the servers goes down, and user send a request to schedule the submitted process. This time he will receive a response with PARTIAL status and a FAILURE message from the server that has gone down. If the users check he will find the process would have been started and running on the 2 SUCCESSFUL servers. Now the issue with server is figured out and it is brought up. Sending the SCHEDULE request again through prism will result in a SUCCESSFUL response from prism as well as other three servers, but this time PROCESS will be SCHEDULED only on the server which had failed earlier and other two will keep running as before. 
@@ -711,6 +740,38 @@ Recipes is detailed in [[Recipes][Recipe
 
 Monitoring and Operationalizing Falcon is detailed in [[Operability][Operability]].
 
+---++ Email Notification
+Notification for instance completion in Falcon is defined in [[FalconEmailNotification][Falcon Email Notification]].
+
 ---++ Backwards Compatibility
 
 Backwards compatibility instructions are [[Compatibility][detailed here.]]
+
+---++ Proxyuser support
+Falcon supports impersonation or proxyuser functionality (identical to Hadoop proxyuser capabilities and conceptually
+similar to Unix 'sudo').
+
+Proxyuser enables Falcon clients to submit entities on behalf of other users. Falcon will utilize Hadoop core's hadoop-auth
+module to implement this functionality.
+
+Because proxyuser is a powerful capability, Falcon provides the following restriction capabilities (similar to Hadoop):
+
+   * Proxyuser is an explicit configuration on per proxyuser user basis.
+   * A proxyuser user can be restricted to impersonate other users from a set of hosts.
+   * A proxyuser user can be restricted to impersonate users belonging to a set of groups.
+
+There are 2 configuration properties needed in runtime properties to set up a proxyuser:
+   * falcon.service.ProxyUserService.proxyuser.#USER#.hosts: hosts from where the user #USER# can impersonate other users.
+   * falcon.service.ProxyUserService.proxyuser.#USER#.groups: groups the users being impersonated by user #USER# must belong to.
+
+If these configurations are not present, impersonation will not be allowed and connection will fail. If more lax security is preferred,
+the wildcard value * may be used to allow impersonation from any host or of any user, although this is recommended only for testing/development.
+
+-doAs option via  CLI or doAs query parameter can be appended if using API to enable impersonation.
+
+---++ ImportExport
+
+Data Import and Export is detailed in [[ImportExport][Data Import and Export]].
+
+
+

Added: falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki (added)
+++ falcon/trunk/general/src/site/twiki/FalconEmailNotification.twiki Mon Feb 15 05:48:00 2016
@@ -0,0 +1,29 @@
+---++Falcon Email Notification
+
+Falcon Email notification allows sending email notifications when scheduled feed/process instances complete.
+Email notification in feed/process entity can be defined as follows:
+<verbatim>
+<process name="[process name]">
+    ...
+    <notification type="email" to="bob@xyz.com,tom@xyz.com"/>
+    ...
+</process>
+</verbatim>
+
+   *  *type*    - specifies about the type of notification. *Note:* Currently "email" notification type is supported.
+   *  *to*  - specifies the address to send notifications to; multiple recipients may be provided as a comma-separated list.
+
+
+Falcon email notification requires some SMTP server configuration to be defined in startup.properties. Following are the values
+it looks for:
+   * *falcon.email.smtp.host*   - The host where the email action may find the SMTP server (localhost by default).
+   * *falcon.email.smtp.port*   - The port to connect to for the SMTP server (25 by default).
+   * *falcon.email.from.address*    - The from address to be used for mailing all emails (falcon@localhost by default).
+   * *falcon.email.smtp.auth*   - Boolean property that specifies if authentication is to be done or not. (false by default).
+   * *falcon.email.smtp.user*   - If authentication is enabled, the username to login as (empty by default).
+   * *falcon.email.smtp.password*   - If authentication is enabled, the username's password (empty by default).
+
+
+
+Also ensure that email notification plugin is enabled in startup.properties to send email notifications:
+   * *monitoring.plugins*   - org.apache.falcon.plugin.EmailNotificationPlugin,org.apache.falcon.plugin.DefaultMonitoringPlugin
\ No newline at end of file

Added: falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki (added)
+++ falcon/trunk/general/src/site/twiki/FalconNativeScheduler.twiki Mon Feb 15 05:48:00 2016
@@ -0,0 +1,213 @@
+---+ Falcon Native Scheduler
+
+---++ Overview
+Falcon has been using Oozie as its scheduling engine.  While the use of Oozie works reasonably well, there are scenarios where Oozie scheduling is proving to be a limiting factor. In its current form, Falcon relies on Oozie for both scheduling and for workflow execution, due to which the scheduling is limited to time based/cron based scheduling with additional gating conditions on data availability. Also, this imposes restrictions on datasets being periodic in nature. In order to offer better scheduling capabilities, Falcon comes with its own native scheduler. 
+
+---++ Capabilities
+The native scheduler will offer the capabilities offered by Oozie co-ordinator and more. The native scheduler will be built and released over the next few releases of Falcon giving users an opportunity to use it and provide feedback.
+
+Currently, the native scheduler offers the following capabilities:
+   1. Submit and schedule a Falcon process that runs periodically (without data dependency) - It could be a PIG script, oozie workflow, Hive (all the engine types currently supported).
+   1. Monitor/Query/Modify the scheduled process - All applicable entity APIs and instance APIs should work as it does now.  Falcon provides data management functions for feeds declaratively. It allows users to represent feed locations as time-based partition directories on HDFS containing files.
+
+*NOTE: Execution order is FIFO. LIFO and LAST_ONLY are not supported yet.*
+
+In the near future, Falcon scheduler will provide feature parity with Oozie scheduler and in subsequent releases will provide the following features:
+   * Periodic, cron-based, calendar-based scheduling.
+   * Data availability based scheduling.
+   * External trigger/notification based scheduling.
+   * Support for periodic/a-periodic datasets.
+   * Support for optional/mandatory datasets. Option to specify minumum/maximum/exactly-N instances of data to consume.
+   * Handle dependencies across entities during re-run.
+
+---++ Configuring Native Scheduler
+You can enable native scheduler by making changes to __$FALCON_HOME/conf/startup.properties__ as follows. You will need to restart Falcon Server for the changes to take effect.
+<verbatim>
+*.dag.engine.impl=org.apache.falcon.workflow.engine.OozieDAGEngine
+*.application.services=org.apache.falcon.security.AuthenticationInitializationService,\
+                        org.apache.falcon.workflow.WorkflowJobEndNotificationService, \
+                        org.apache.falcon.service.ProcessSubscriberService,\
+                        org.apache.falcon.service.FeedSLAMonitoringService,\
+                        org.apache.falcon.service.LifecyclePolicyMap,\
+                        org.apache.falcon.state.store.service.FalconJPAService,\
+                        org.apache.falcon.entity.store.ConfigurationStore,\
+                        org.apache.falcon.rerun.service.RetryService,\
+                        org.apache.falcon.rerun.service.LateRunService,\
+                        org.apache.falcon.metadata.MetadataMappingService,\
+                        org.apache.falcon.service.LogCleanupService,\
+                        org.apache.falcon.service.GroupsService,\
+                        org.apache.falcon.service.ProxyUserService,\
+                        org.apache.falcon.notification.service.impl.JobCompletionService,\
+                        org.apache.falcon.notification.service.impl.SchedulerService,\
+                        org.apache.falcon.notification.service.impl.AlarmService,\
+                        org.apache.falcon.notification.service.impl.DataAvailabilityService,\
+                        org.apache.falcon.execution.FalconExecutionService
+</verbatim>
+
+---+++ Making the Native Scheduler the default scheduler
+To ensure backward compatibility, even when the native scheduler is enabled, the default scheduler is still Oozie. This means users will be scheduling entities on Oozie scheduler, by default. They will need to explicitly specify the scheduler as native, if they wish to schedule entities using native scheduler. 
+
+<a href="#Scheduling_new_entities_on_Native_Scheduler">This section</a> has more details on how to schedule on either of the schedulers. 
+
+If you wish to make the Falcon Native Scheduler your default scheduler and remove Oozie as the scheduler, set the following property in __$FALCON_HOME/conf/startup.properties__
+<verbatim>
+## If you wish to use Falcon native scheduler as your default scheduler, set the workflow engine to FalconWorkflowEngine instead of OozieWorkflowEngine. ##
+*.workflow.engine.impl=org.apache.falcon.workflow.engine.FalconWorkflowEngine
+</verbatim>
+
+---+++ Configuring the state store for Native Scheduler
+You can configure statestore by making changes to __$FALCON_HOME/conf/statestore.properties__ as follows. You will need to restart Falcon Server for the changes to take effect.
+
+Falcon Server needs to maintain state of the entities and instances in a persistent store for the system to be recoverable. Since Prism only federates, it does not need to maintain any state information. Following properties need to be set in statestore.properties of Falcon Servers:
+<verbatim>
+######### StateStore Properties #####
+*.falcon.state.store.impl=org.apache.falcon.state.store.jdbc.JDBCStateStore
+*.falcon.statestore.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
+*.falcon.statestore.jdbc.url=jdbc:derby:data/falcon.db
+# StateStore credentials file where username,password and other properties can be stored securely.
+# Set this credentials file permission 400 and make sure user who starts falcon should only have read permission.
+# Give Absolute path to credentials file along with file name or put in classpath with file name statestore.credentials.
+# Credentials file should be present either in given location or class path, otherwise falcon won't start.
+*.falcon.statestore.credentials.file=
+*.falcon.statestore.jdbc.username=sa
+*.falcon.statestore.jdbc.password=
+*.falcon.statestore.connection.data.source=org.apache.commons.dbcp.BasicDataSource
+# Maximum number of active connections that can be allocated from this pool at the same time.
+*.falcon.statestore.pool.max.active.conn=10
+*.falcon.statestore.connection.properties=
+# Indicates the interval (in milliseconds) between eviction runs.
+*.falcon.statestore.validate.db.connection.eviction.interval=300000
+## The number of objects to examine during each run of the idle object evictor thread.
+*.falcon.statestore.validate.db.connection.eviction.num=10
+## Creates Falcon DB.
+## If set to true, it creates the DB schema if it does not exist. If the DB schema exists is a NOP.
+## If set to false, it does not create the DB schema. If the DB schema does not exist it fails start up.
+*.falcon.statestore.create.db.schema=true
+</verbatim> 
+
+The _*.falcon.statestore.jdbc.url_ property in statestore.properties determines the DB and data location. All other properties are common across RDBMS.
+
+*NOTE : Although multiple Falcon Servers can share a DB (not applicable for Derby DB), it is recommended that you have different DBs for different Falcon Servers for better performance.*
+
+You will need to create the state DB and tables before starting the Falcon Server. To create tables, a tool comes bundled with the Falcon installation. You can use the _falcon-db.sh_ script to create tables in the DB. The script needs to be run only for Falcon Servers and can be run by any user that has execute permission on the script. The script picks up the DB connection details from __$FALCON_HOME/conf/statestore.properties__. Ensure that you have granted the right privileges to the user mentioned in statestore.properties_, so the tables can be created.
+
+You can use the help command to get details on the sub-commands supported:
+<verbatim>
+./bin/falcon-db.sh help
+Hadoop home is set, adding libraries from '/Users/pallavi.rao/falcon/hadoop-2.6.0/bin/hadoop classpath' into falcon classpath
+usage: 
+      Falcon DB initialization tool currently supports Derby DB/ Mysql
+
+      falcondb help : Display usage for all commands or specified command
+
+      falcondb version : Show Falcon DB version information
+
+      falcondb create <OPTIONS> : Create Falcon DB schema
+                      -run             Confirmation option regarding DB schema creation/upgrade
+                      -sqlfile <arg>   Generate SQL script instead of creating/upgrading the DB
+                                       schema
+
+      falcondb upgrade <OPTIONS> : Upgrade Falcon DB schema
+                       -run             Confirmation option regarding DB schema creation/upgrade
+                       -sqlfile <arg>   Generate SQL script instead of creating/upgrading the DB
+                                        schema
+
+</verbatim>
+Currently, MySQL and Derby are supported as state stores. We may extend support to other DBs in the future. Falcon has been tested against MySQL v5.5. If you are using MySQL ensure you also copy mysql-connector-java-<version>.jar under __$FALCON_HOME/server/webapp/falcon/WEB-INF/lib__ and __$FALCON_HOME/client/lib__
+
+---++++ Using Derby as the State Store
+Using Derby is ideal for QA and staging setup. Falcon comes bundled with a Derby connector and no explicit setup is required (although you can set it up) in terms creating the DB or tables.
+For example,
+ <verbatim> *.falcon.statestore.jdbc.url=jdbc:derby:data/falcon.db;create=true </verbatim>
+
+ tells Falcon to use the Derby JDBC connector, with data directory, $FALCON_HOME/data/ and DB name 'falcon'. If _create=true_ is specified, you will not need to create a DB up front; a database will be created if it does not exist.
+
+---++++ Using MySQL as the State Store
+The jdbc.url property in statestore.properties determines the DB and data location.
+For example,
+ <verbatim> *.falcon.statestore.jdbc.url=jdbc:mysql://localhost:3306/falcon </verbatim>
+
+ tells Falcon to use the MySQL JDBC connector, which is accessible @localhost:3306, with DB name 'falcon'.
+
+---++ Scheduling new entities on Native Scheduler
+To schedule an entity (currently only process is supported) using the native scheduler, you need to specify the scheduler in the schedule command as shown below:
+<verbatim>
+$FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule -properties falcon.scheduler:native
+</verbatim>
+
+If Oozie is configured as the default scheduler, you can skip the scheduler option or explicitly set it to _oozie_, as shown below:
+<verbatim>
+$FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule
+OR
+$FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule -properties falcon.scheduler:oozie
+</verbatim>
+
+If the native scheduler is configured as the default scheduler, then, you can omit the scheduler option, as shown below:
+<verbatim>
+$FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule 
+</verbatim>
+
+---++ Migrating entities from Oozie Scheduler to Native Scheduler
+Currently, user will have to delete and re-create entities in order to move across schedulers. Attempting to schedule an already scheduled entity on a different scheduler will result in an error. Note that the history of instances prior to scheduling on native scheduler will not be available via the instance APIs. However, user can retrieve that information using metadata APIs. Native scheduler must be enabled before migrating entities to native scheduler.
+
+<a href="#Configuring_Native_Scheduler">Configuring Native Scheduler</a> has more details on how to enable native scheduler.
+
+---+++ Migrating from Oozie to Native Scheduler
+   * Delete the entity (process). 
+<verbatim>$FALCON_HOME/bin/falcon entity -type process -name <process name> -delete </verbatim>
+   * Submit the entity (process) with start time from where the Oozie scheduler left off. 
+<verbatim>$FALCON_HOME/bin/falcon entity -type process -submit <path to process xml> </verbatim>
+   * Schedule the entity on native scheduler. 
+<verbatim> $FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule -properties falcon.scheduler:native </verbatim>
+
+---+++ Reverting to Oozie from Native Scheduler
+   * Delete the entity (process). 
+<verbatim>$FALCON_HOME/bin/falcon entity -type process -name <process name> -delete </verbatim>
+   * Submit the entity (process) with start time from where the Native scheduler left off. 
+<verbatim>$FALCON_HOME/bin/falcon entity -type process -submit <path to process xml> </verbatim>
+   * Schedule the entity on the default scheduler (Oozie).
+ <verbatim> $FALCON_HOME/bin/falcon entity -type process -name <process name> -schedule </verbatim>
+
+---+++ Differences in API responses between Oozie and Native Scheduler
+Most API responses are similar whether the entity is scheduled via Oozie or via Native scheduler. However, there are a few exceptions and those are listed below.
+---++++ Rerun API
+When a user performs a rerun using Oozie scheduler, Falcon directly reruns the workflow on Oozie and the instance will be moved to 'RUNNING'.
+
+Example response:
+<verbatim>
+$ falcon instance -rerun processMerlinOozie -start 2016-01-08T12:13Z -end 2016-01-08T12:15Z
+Consolidated Status: SUCCEEDED
+
+Instances:
+Instance		Cluster		SourceCluster		Status		Start		End		Details					Log
+-----------------------------------------------------------------------------------------------
+2016-01-08T12:13Z	ProcessMultipleClustersTest-corp-9706f068	-	RUNNING	2016-01-08T13:03Z	2016-01-08T13:03Z	-	http://8RPCG32.corp.inmobi.com:11000/oozie?job=0001811-160104160825636-oozie-oozi-W
+2016-01-08T12:13Z	ProcessMultipleClustersTest-corp-0b270a1d	-	RUNNING	2016-01-08T13:03Z	2016-01-08T13:03Z	-	http://lda01:11000/oozie?job=0002247-160104115615658-oozie-oozi-W
+
+Additional Information:
+Response: ua1/RERUN
+ua2/RERUN
+Request Id: ua1/871377866@qtp-630572412-35 - 7190c4c8-bacb-4639-8d48-c9e639f544da
+ua2/1554129706@qtp-536122141-13 - bc18127b-1bf8-4ea1-99e6-b1f10ba3a441
+</verbatim>
+
+However, when a user performs a rerun on native scheduler, the instance is scheduled again. This is done intentionally so as to not violate the number of instances running in parallel.  Hence, the user will see the status of the instance as 'READY'.
+
+Example response:
+<verbatim>
+$ falcon instance -rerun ProcessMultipleClustersTest-agregator-coord16-8f55f59b -start 2016-01-08T12:13Z -end 2016-01-08T12:15Z
+Consolidated Status: SUCCEEDED
+
+Instances:
+Instance		Cluster		SourceCluster		Status		Start		End		Details					Log
+-----------------------------------------------------------------------------------------------
+2016-01-08T12:13Z	ProcessMultipleClustersTest-corp-9706f068	-	READY	2016-01-08T13:03Z	2016-01-08T13:03Z	-	http://8RPCG32.corp.inmobi.com:11000/oozie?job=0001812-160104160825636-oozie-oozi-W
+
+2016-01-08T12:13Z	ProcessMultipleClustersTest-corp-0b270a1d	-	READY	2016-01-08T13:03Z	2016-01-08T13:03Z	-	http://lda01:11000/oozie?job=0002248-160104115615658-oozie-oozi-W
+
+Additional Information:
+Response: ua1/RERUN
+ua2/RERUN
+Request Id: ua1/871377866@qtp-630572412-35 - 8d118d4d-c0ef-4335-a9af-10364498ec4f
+ua2/1554129706@qtp-536122141-13 - c2a3fc50-8b05-47ce-9c85-ca432b96d923
+</verbatim>

Added: falcon/trunk/general/src/site/twiki/HDFSDR.twiki
URL: http://svn.apache.org/viewvc/falcon/trunk/general/src/site/twiki/HDFSDR.twiki?rev=1730449&view=auto
==============================================================================
--- falcon/trunk/general/src/site/twiki/HDFSDR.twiki (added)
+++ falcon/trunk/general/src/site/twiki/HDFSDR.twiki Mon Feb 15 05:48:00 2016
@@ -0,0 +1,34 @@
+---+ HDFS DR Recipe
+---++ Overview
+Falcon supports HDFS DR recipe to replicate data from source cluster to destination cluster.
+
+---++ Usage
+---+++ Setup cluster definition.
+   <verbatim>
+    $FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml
+   </verbatim>
+
+---+++ Update recipes properties
+   Copy HDFS replication recipe properties, workflow and template file from $FALCON_HOME/data-mirroring/hdfs-replication to the accessible
+   directory path or to the recipe directory path (*falcon.recipe.path=<recipe directory path>*). *"falcon.recipe.path"* must be specified
+   in Falcon conf client.properties. Now update the copied recipe properties file with required attributes to replicate data from source cluster to
+   destination cluster for HDFS DR.
+
+---+++ Submit HDFS DR recipe
+
+   After updating the recipe properties file with required attributes in directory path or in falcon.recipe.path,
+   there are two ways of submitting the HDFS DR recipe:
+
+   * 1. Specify Falcon recipe properties file through recipe command line.
+   <verbatim>
+    $FALCON_HOME/bin/falcon recipe -name hdfs-replication -operation HDFS_REPLICATION
+    -properties /cluster/hdfs-replication.properties
+   </verbatim>
+
+   * 2. Use Falcon recipe path specified in Falcon conf client.properties .
+   <verbatim>
+    $FALCON_HOME/bin/falcon recipe -name hdfs-replication -operation HDFS_REPLICATION
+   </verbatim>
+
+
+*Note:* Recipe properties file, workflow file and template file name must match to the recipe name, it must be unique and in the same directory.




Mime
View raw message