falcon-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ajayyad...@apache.org
Subject [09/15] falcon git commit: FALCON-1301 Improve documentation for Installation. Contributed by Pragya Mittal
Date Sat, 08 Aug 2015 14:40:25 GMT
FALCON-1301 Improve documentation for Installation. Contributed by Pragya Mittal


Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/ac5051e9
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/ac5051e9
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/ac5051e9

Branch: refs/heads/0.7
Commit: ac5051e96c03f24b276d54d65d3d5411dcaedfe7
Parents: 8cdac2b
Author: Ajay Yadava <ajaynsit@gmail.com>
Authored: Tue Aug 4 17:19:36 2015 +0530
Committer: Ajay Yadav <ajay.yadav@inmobi.com>
Committed: Sat Aug 8 20:06:41 2015 +0530

----------------------------------------------------------------------
 CHANGES.txt                                 |   2 +
 docs/src/site/twiki/Configuration.twiki     | 113 ++++++++
 docs/src/site/twiki/Distributed-mode.twiki  | 198 ++++++++++++++
 docs/src/site/twiki/Embedded-mode.twiki     | 198 ++++++++++++++
 docs/src/site/twiki/InstallationSteps.twiki | 326 +++--------------------
 5 files changed, 551 insertions(+), 286 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/falcon/blob/ac5051e9/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index e1eae4f..6148bc6 100755
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -11,6 +11,8 @@ Trunk (Unreleased)
     FALCON-796 Enable users to triage data processing issues through falcon (Ajay Yadava)
     
   IMPROVEMENTS
+    FALCON-1301 Improve documentation for Installation(Pragya Mittal via Ajay Yadava)
+
     FALCON-1322 Add prefix in runtime.properties(Sandeep Samudrala via Ajay Yadava)
 
     FALCON-1317 Inconsistent JSON serialization(Ajay Yadava)

http://git-wip-us.apache.org/repos/asf/falcon/blob/ac5051e9/docs/src/site/twiki/Configuration.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Configuration.twiki b/docs/src/site/twiki/Configuration.twiki
new file mode 100644
index 0000000..37b5717
--- /dev/null
+++ b/docs/src/site/twiki/Configuration.twiki
@@ -0,0 +1,113 @@
+---+Configuring Falcon
+
+By default config directory used by falcon is {package dir}/conf. To override this (to use
the same conf with multiple
+falcon upgrades), set environment variable FALCON_CONF to the path of the conf dir.
+
+falcon-env.sh has been added to the falcon conf. This file can be used to set various environment
variables that you
+need for you services.
+In addition you can set any other environment variables you might need. This file will be
sourced by falcon scripts
+before any commands are executed. The following environment variables are available to set.
+
+<verbatim>
+# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be
in path
+#export JAVA_HOME=
+
+# any additional java opts you want to set. This will apply to both client and server operations
+#export FALCON_OPTS=
+
+# any additional java opts that you want to set for client only
+#export FALCON_CLIENT_OPTS=
+
+# java heap size we want to set for the client. Default is 1024MB
+#export FALCON_CLIENT_HEAP=
+
+# any additional opts you want to set for prism service.
+#export FALCON_PRISM_OPTS=
+
+# java heap size we want to set for the prism service. Default is 1024MB
+#export FALCON_PRISM_HEAP=
+
+# any additional opts you want to set for falcon service.
+#export FALCON_SERVER_OPTS=
+
+# java heap size we want to set for the falcon server. Default is 1024MB
+#export FALCON_SERVER_HEAP=
+
+# What is is considered as falcon home dir. Default is the base location of the installed
software
+#export FALCON_HOME_DIR=
+
+# Where log files are stored. Default is logs directory under the base install location
+#export FALCON_LOG_DIR=
+
+# Where pid files are stored. Default is logs directory under the base install location
+#export FALCON_PID_DIR=
+
+# where the falcon active mq data is stored. Default is logs/data directory under the base
install location
+#export FALCON_DATA_DIR=
+
+# Where do you want to expand the war file. By Default it is in /server/webapp dir under
the base install dir.
+#export FALCON_EXPANDED_WEBAPP_DIR=
+</verbatim>
+
+---++Advanced Configurations
+
+---+++Configuring Monitoring plugin to register catalog partitions
+Falcon comes with a monitoring plugin that registers catalog partition. This comes in really
handy during migration from
+ filesystem based feeds to hcatalog based feeds.
+This plugin enables the user to de-couple the partition registration and assume that all
partitions are already on
+hcatalog even before the migration, simplifying the hcatalog migration.
+
+By default this plugin is disabled.
+To enable this plugin and leverage the feature, there are 3 pre-requisites:
+<verbatim>
+In {package dir}/conf/startup.properties, add
+*.workflow.execution.listeners=org.apache.falcon.catalog.CatalogPartitionHandler
+
+In the cluster definition, ensure registry endpoint is defined.
+Ex:
+<interface type="registry" endpoint="thrift://localhost:1109" version="0.13.3"/>
+
+In the feed definition, ensure the corresponding catalog table is mentioned in feed-properties
+Ex:
+<properties>
+    <property name="catalog.table" value="catalog:default:in_table#year={YEAR};month={MONTH};day={DAY};hour={HOUR};
+    minute={MINUTE}"/>
+</properties>
+</verbatim>
+
+*NOTE : for Mac OS users*
+<verbatim>
+If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS (explained above).
+
+In  {package dir}/conf/falcon-env.sh uncomment the following line
+#export FALCON_SERVER_OPTS=
+
+and change it to look as below
+export FALCON_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
+</verbatim>
+
+
+---+++Activemq
+
+* falcon server starts embedded active mq. To control this behaviour, set the following system
properties using -D
+option in environment variable FALCON_OPTS:
+   * falcon.embeddedmq=<true/false> - Should server start embedded active mq, default
true
+   * falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616
+   * falcon.embeddedmq.data=<path> - Data path for embedded active mq, default {package
dir}/logs/data
+
+---+++Adding Extension Libraries
+
+Library extensions allows users to add custom libraries to entity lifecycles such as feed
retention, feed replication
+and process execution. This is useful for usecases such as adding filesystem extensions.
To enable this, add the
+following configs to startup.properties:
+*.libext.paths=<paths to be added to all entity lifecycles>
+
+*.libext.feed.paths=<paths to be added to all feed lifecycles>
+
+*.libext.feed.retentions.paths=<paths to be added to feed retention workflow>
+
+*.libext.feed.replication.paths=<paths to be added to feed replication workflow>
+
+*.libext.process.paths=<paths to be added to process workflow>
+
+The configured jars are added to falcon classpath and the corresponding workflows.

http://git-wip-us.apache.org/repos/asf/falcon/blob/ac5051e9/docs/src/site/twiki/Distributed-mode.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Distributed-mode.twiki b/docs/src/site/twiki/Distributed-mode.twiki
new file mode 100644
index 0000000..617ab51
--- /dev/null
+++ b/docs/src/site/twiki/Distributed-mode.twiki
@@ -0,0 +1,198 @@
+---+Distributed Mode
+
+
+Following are the steps needed to package and deploy Falcon in Embedded Mode. You need to
complete Steps 1-3 mentioned
+ [[InstallationSteps][here]] before proceeding further.
+
+---++Package Falcon
+Ensure that you are in the base directory (where you cloned Falcon). Let’s call it {project
dir}
+
+<verbatim>
+$mvn clean assembly:assembly -DskipTests -DskipCheck=true -Pdistributed,hadoop-2
+</verbatim>
+
+
+<verbatim>
+$ls {project dir}/target/
+</verbatim>
+
+It should give an output like below :
+<verbatim>
+apache-falcon-distributed-${project.version}-server.tar.gz
+apache-falcon-distributed-${project.version}-sources.tar.gz
+archive-tmp
+maven-shared-archive-resources
+</verbatim>
+
+   * apache-falcon-distributed-${project.version}-sources.tar.gz contains source files of
Falcon repo.
+
+   * apache-falcon-distributed-${project.version}-server.tar.gz package contains project
artifacts along with it's
+dependencies, configuration files and scripts required to deploy Falcon.
+
+
+Tar can be found in {project dir}/target/apache-falcon-distributed-${project.version}-server.tar.gz
. This is the tar
+used for installing Falcon. Lets call it {falcon package}
+
+Tar is structured as follows.
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+   |- falcon-status
+   |- falcon-config.sh
+   |- service-start.sh
+   |- service-stop.sh
+   |- service-status.sh
+   |- prism-stop
+   |- prism-start
+   |- prism-status
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- client.properties
+   |- prism.keystore
+   |- log4j.xml
+   |- falcon-env.sh
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- falcon.war
+      |- prism.war
+|- oozie
+   |- conf
+   |- libext
+|- hadooplibs
+|- README
+|- NOTICE.txt
+|- LICENSE.txt
+|- DISCLAIMER.txt
+|- CHANGES.txt
+</verbatim>
+
+
+---++Installing & running Falcon
+
+---+++Installing Falcon
+
+Running Falcon in distributed mode requires bringing up both prism and server.As the name
suggests Falcon prism splits
+the request it gets to the Falcon servers. It is a good practice to start prism and server
with their corresponding
+configurations separately. Create separate directory for prism and server. Let's call them
{falcon-prism-dir} and
+{falcon-server-dir} respectively.
+
+*For prism*
+<verbatim>
+$mkdir {falcon-prism-dir}
+$tar -xzvf {falcon package}
+</verbatim>
+
+*For server*
+<verbatim>
+$mkdir {falcon-server-dir}
+$tar -xzvf {falcon package}
+</verbatim>
+
+
+---+++Starting Prism
+
+<verbatim>
+cd {falcon-prism-dir}/falcon-distributed-${project.version}
+bin/prism-start [-port <port>]
+</verbatim>
+
+By default,
+* prism server starts at port 16443. To change the port, use -port option
+
+* falcon.enableTLS can be set to true or false explicitly to enable SSL, if not port that
end with 443 will
+automatically put prism on https://
+
+* prism starts with conf from {falcon-prism-dir}/falcon-distributed-${project.version}/conf.
To override this (to use
+the same conf with multiple prism upgrades), set environment variable FALCON_CONF to the
path of conf dir. You can find
+the instructions for configuring Falcon [[Configuration][here]].
+
+*Enabling prism-client*
+*If prism is not started using default-port 16443 then edit the following property in
+{falcon-prism-dir}/falcon-distributed-${project.version}/conf/client.properties
+falcon.url=http://{machine-ip}:{prism-port}/
+
+
+---+++Starting Falcon Server
+
+<verbatim>
+$cd {falcon-server-dir}/falcon-distributed-${project.version}
+$bin/falcon-start [-port <port>]
+</verbatim>
+
+By default,
+* If falcon.enableTLS is set to true explicitly or not set at all, Falcon starts at port
15443 on https:// by default.
+
+* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 on http://.
+
+* To change the port, use -port option.
+
+* If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put
Falcon on https://. Any
+other port will put Falcon on http://.
+
+* server starts with conf from {falcon-server-dir}/falcon-distributed-${project.version}/conf.
To override this (to use
+the same conf with multiple server upgrades), set environment variable FALCON_CONF to the
path of conf dir. You can find
+ the instructions for configuring Falcon [[Configuration][here]].
+
+*Enabling server-client*
+*If server is not started using default-port 15443 then edit the following property in
+{falcon-server-dir}/falcon-distributed-${project.version}/conf/client.properties. You can
find the instructions for
+configuring Falcon here.
+falcon.url=http://{machine-ip}:{server-port}/
+
+*NOTE* : https is the secure version of HTTP, the protocol over which data is sent between
your browser and the website
+that you are connected to. By default Falcon runs in https mode. But user can configure it
to http.
+
+
+---+++Using Falcon
+
+<verbatim>
+$cd {falcon-prism-dir}/falcon-distributed-${project.version}
+$bin/falcon admin -version
+Falcon server build version: {Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",
+Mode:"embedded"}
+
+$bin/falcon help
+(for more details about Falcon cli usage)
+</verbatim>
+
+
+---+++Dashboard
+
+Once Falcon / prism is started, you can view the status of Falcon entities using the Web-based
dashboard. You can open
+your browser at the corresponding port to use the web UI.
+
+Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not
exist on your Falcon and
+Oozie servers, please create the user.
+
+<verbatim>
+## create user.
+[root@falconhost ~] useradd -U -m falcon-dashboard -G users
+
+## verify user is created with membership in correct groups.
+[root@falconhost ~] groups falcon-dashboard
+falcon-dashboard : falcon-dashboard users
+[root@falconhost ~]
+</verbatim>
+
+
+---+++Stopping Falcon Server
+
+<verbatim>
+$cd {falcon-server-dir}/falcon-distributed-${project.version}
+$bin/falcon-stop
+</verbatim>
+
+---+++Stopping Falcon Prism
+
+<verbatim>
+$cd {falcon-prism-dir}/falcon-distributed-${project.version}
+$bin/prism-stop
+</verbatim>

http://git-wip-us.apache.org/repos/asf/falcon/blob/ac5051e9/docs/src/site/twiki/Embedded-mode.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Embedded-mode.twiki b/docs/src/site/twiki/Embedded-mode.twiki
new file mode 100644
index 0000000..96ae8ab
--- /dev/null
+++ b/docs/src/site/twiki/Embedded-mode.twiki
@@ -0,0 +1,198 @@
+---+Embedded Mode
+
+Following are the steps needed to package and deploy Falcon in Embedded Mode. You need to
complete Steps 1-3 mentioned
+ [[InstallationSteps][here]] before proceeding further.
+
+---++Package Falcon
+Ensure that you are in the base directory (where you cloned Falcon). Let’s call it {project
dir}
+
+<verbatim>
+$mvn clean assembly:assembly -DskipTests -DskipCheck=true
+</verbatim>
+
+<verbatim>
+$ls {project dir}/target/
+</verbatim>
+It should give an output like below :
+<verbatim>
+apache-falcon-${project.version}-bin.tar.gz
+apache-falcon-${project.version}-sources.tar.gz
+archive-tmp
+maven-shared-archive-resources
+</verbatim>
+
+* apache-falcon-${project.version}-sources.tar.gz contains source files of Falcon repo.
+
+* apache-falcon-${project.version}-bin.tar.gz package contains project artifacts along with
it's dependencies,
+configuration files and scripts required to deploy Falcon.
+
+Tar can be found in {project dir}/target/apache-falcon-${project.version}-bin.tar.gz
+
+Tar is structured as follows :
+
+<verbatim>
+
+|- bin
+   |- falcon
+   |- falcon-start
+   |- falcon-stop
+   |- falcon-status
+   |- falcon-config.sh
+   |- service-start.sh
+   |- service-stop.sh
+   |- service-status.sh
+|- conf
+   |- startup.properties
+   |- runtime.properties
+   |- prism.keystore
+   |- client.properties
+   |- log4j.xml
+   |- falcon-env.sh
+|- docs
+|- client
+   |- lib (client support libs)
+|- server
+   |- webapp
+      |- falcon.war
+|- data
+   |- falcon-store
+   |- graphdb
+   |- localhost
+|- examples
+   |- app
+      |- hive
+      |- oozie-mr
+      |- pig
+   |- data
+   |- entity
+      |- filesystem
+      |- hcat
+|- oozie
+   |- conf
+   |- libext
+|- logs
+|- hadooplibs
+|- README
+|- NOTICE.txt
+|- LICENSE.txt
+|- DISCLAIMER.txt
+|- CHANGES.txt
+</verbatim>
+
+
+---++Installing & running Falcon
+
+Running Falcon in embedded mode requires bringing up server.
+
+<verbatim>
+$tar -xzvf {falcon package}
+$cd falcon-${project.version}
+</verbatim>
+
+
+---+++Starting Falcon Server
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-start [-port <port>]
+</verbatim>
+
+By default,
+* If falcon.enableTLS is set to true explicitly or not set at all, Falcon starts at port
15443 on https:// by default.
+
+* If falcon.enableTLS is set to false explicitly, Falcon starts at port 15000 on http://.
+
+* To change the port, use -port option.
+
+* If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put
Falcon on https://. Any
+other port will put Falcon on http://.
+
+* Server starts with conf from {falcon-server-dir}/falcon-distributed-${project.version}/conf.
To override this (to use
+the same conf with multiple server upgrades), set environment variable FALCON_CONF to the
path of conf dir. You can find
+ the instructions for configuring Falcon [[Configuration][here]].
+
+
+---+++Enabling server-client
+If server is not started using default-port 15443 then edit the following property in
+{falcon-server-dir}/falcon-${project.version}/conf/client.properties
+
+falcon.url=http://{machine-ip}:{server-port}/
+
+
+---+++Using Falcon
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon admin -version
+Falcon server build version: {Version:"${project.version}-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",Mode:
+"embedded",Hadoop:"${hadoop.version}"}
+
+$bin/falcon help
+(for more details about Falcon cli usage)
+</verbatim>
+
+*Note* : https is the secure version of HTTP, the protocol over which data is sent between
your browser and the website
+that you are connected to. By default Falcon runs in https mode. But user can configure it
to http.
+
+
+---+++Dashboard
+
+Once Falcon server is started, you can view the status of Falcon entities using the Web-based
dashboard. You can open
+your browser at the corresponding port to use the web UI.
+
+Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not
exist on your Falcon and
+Oozie servers, please create the user.
+
+<verbatim>
+## create user.
+[root@falconhost ~] useradd -U -m falcon-dashboard -G users
+
+## verify user is created with membership in correct groups.
+[root@falconhost ~] groups falcon-dashboard
+falcon-dashboard : falcon-dashboard users
+[root@falconhost ~]
+</verbatim>
+
+
+---++Running Examples using embedded package
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-start
+</verbatim>
+Make sure the Hadoop and Oozie endpoints are according to your setup in
+examples/entity/filesystem/standalone-cluster.xml
+The cluster locations,staging and working dirs, MUST be created prior to submitting a cluster
entity to Falcon.
+*staging* must have 777 permissions and the parent dirs must have execute permissions
+*working* must have 755 permissions and the parent dirs must have execute permissions
+<verbatim>
+$bin/falcon entity -submit -type cluster -file examples/entity/filesystem/standalone-cluster.xml
+</verbatim>
+Submit input and output feeds:
+<verbatim>
+$bin/falcon entity -submit -type feed -file examples/entity/filesystem/in-feed.xml
+$bin/falcon entity -submit -type feed -file examples/entity/filesystem/out-feed.xml
+</verbatim>
+Set-up workflow for the process:
+<verbatim>
+$hadoop fs -put examples/app /
+</verbatim>
+Submit and schedule the process:
+<verbatim>
+$bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/oozie-mr-process.xml
+$bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/pig-process.xml
+</verbatim>
+Generate input data:
+<verbatim>
+$examples/data/generate.sh <<hdfs endpoint>>
+</verbatim>
+Get status of instances:
+<verbatim>
+$bin/falcon instance -status -type process -name oozie-mr-process -start 2013-11-15T00:05Z
-end 2013-11-15T01:00Z
+</verbatim>
+
+HCat based example entities are in examples/entity/hcat.
+
+
+---+++Stopping Falcon Server
+<verbatim>
+$cd falcon-${project.version}
+$bin/falcon-stop
+</verbatim>

http://git-wip-us.apache.org/repos/asf/falcon/blob/ac5051e9/docs/src/site/twiki/InstallationSteps.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/InstallationSteps.twiki b/docs/src/site/twiki/InstallationSteps.twiki
index 1dd242a..3dd034b 100644
--- a/docs/src/site/twiki/InstallationSteps.twiki
+++ b/docs/src/site/twiki/InstallationSteps.twiki
@@ -1,322 +1,76 @@
----++ Building & Installing Falcon
+---+Building & Installing Falcon
 
 
----+++ Building Falcon
+---++Building Falcon
 
-<verbatim>
-You would need the following installed to build Falcon
-
-* JDK 1.7
-* Maven 3.x
-
-git clone https://git-wip-us.apache.org/repos/asf/falcon.git falcon
-
-cd falcon
-
-export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m -noverify" && mvn clean install
-
-[optionally -Dhadoop.version=<<hadoop.version>> can be appended to build for
a specific version of hadoop]
-*Note:* Falcon drops support for Hadoop-1 and only supports Hadoop-2 from Falcon 0.6 onwards
-[optionally -Doozie.version=<<oozie version>> can be appended to build with a
specific version of oozie.
-Oozie versions >= 4 are supported]
-Falcon build with JDK 1.7 using -noverify option
-
-</verbatim>
-
-Once the build successfully completes, artifacts can be packaged for deployment. The package
can be built in embedded or distributed mode.
-
-*Embedded Mode*
-<verbatim>
-
-mvn clean assembly:assembly -DskipTests -DskipCheck=true
-
-</verbatim>
-
-Tar can be found in {project dir}/target/apache-falcon-${project.version}-bin.tar.gz
-
-Tar is structured as follows
-
-<verbatim>
-
-|- bin
-   |- falcon
-   |- falcon-start
-   |- falcon-stop
-   |- falcon-config.sh
-   |- service-start.sh
-   |- service-stop.sh
-|- conf
-   |- startup.properties
-   |- runtime.properties
-   |- client.properties
-   |- log4j.xml
-   |- falcon-env.sh
-|- docs
-|- client
-   |- lib (client support libs)
-|- server
-   |- webapp
-      |- falcon.war
-|- hadooplibs
-|- README
-|- NOTICE.txt
-|- LICENSE.txt
-|- DISCLAIMER.txt
-|- CHANGES.txt
-</verbatim>
-
-*Distributed Mode*
-
-<verbatim>
-
-mvn clean assembly:assembly -DskipTests -DskipCheck=true -Pdistributed,hadoop-2
-
-</verbatim>
-
-Tar can be found in {project dir}/target/apache-falcon-distributed-${project.version}-server.tar.gz
-
-Tar is structured as follows
-
-<verbatim>
-
-|- bin
-   |- falcon
-   |- falcon-start
-   |- falcon-stop
-   |- falcon-config.sh
-   |- service-start.sh
-   |- service-stop.sh
-   |- prism-stop
-   |- prism-start
-|- conf
-   |- startup.properties
-   |- runtime.properties
-   |- client.properties
-   |- log4j.xml
-   |- falcon-env.sh
-|- docs
-|- client
-   |- lib (client support libs)
-|- server
-   |- webapp
-      |- falcon.war
-      |- prism.war
-|- hadooplibs
-|- README
-|- NOTICE.txt
-|- LICENSE.txt
-|- DISCLAIMER.txt
-|- CHANGES.txt
-</verbatim>
-
----+++ Installing & running Falcon
-
-*Installing falcon*
-<verbatim>
-tar -xzvf {falcon package}
-cd falcon-distributed-${project.version} or falcon-${project.version}
-</verbatim>
-
-*Configuring Falcon*
-
-By default config directory used by falcon is {package dir}/conf. To override this set environment
variable FALCON_CONF to the path of the conf dir.
-
-falcon-env.sh has been added to the falcon conf. This file can be used to set various environment
variables that you need for you services.
-In addition you can set any other environment variables you might need. This file will be
sourced by falcon scripts before any commands are executed. The following environment variables
are available to set.
-
-<verbatim>
-# The java implementation to use. If JAVA_HOME is not found we expect java and jar to be
in path
-#export JAVA_HOME=
-
-# any additional java opts you want to set. This will apply to both client and server operations
-#export FALCON_OPTS=
-
-# any additional java opts that you want to set for client only
-#export FALCON_CLIENT_OPTS=
-
-# java heap size we want to set for the client. Default is 1024MB
-#export FALCON_CLIENT_HEAP=
+---+++Prerequisites
 
-# any additional opts you want to set for prism service.
-#export FALCON_PRISM_OPTS=
+   * JDK 1.7
+   * Maven 3.x
 
-# java heap size we want to set for the prism service. Default is 1024MB
-#export FALCON_PRISM_HEAP=
 
-# any additional opts you want to set for falcon service.
-#export FALCON_SERVER_OPTS=
 
-# java heap size we want to set for the falcon server. Default is 1024MB
-#export FALCON_SERVER_HEAP=
-
-# What is is considered as falcon home dir. Default is the base location of the installed
software
-#export FALCON_HOME_DIR=
-
-# Where log files are stored. Default is logs directory under the base install location
-#export FALCON_LOG_DIR=
-
-# Where pid files are stored. Default is logs directory under the base install location
-#export FALCON_PID_DIR=
-
-# where the falcon active mq data is stored. Default is logs/data directory under the base
install location
-#export FALCON_DATA_DIR=
-
-# Where do you want to expand the war file. By Default it is in /server/webapp dir under
the base install dir.
-#export FALCON_EXPANDED_WEBAPP_DIR=
-</verbatim>
-
-*Configuring Monitoring plugin to register catalog partitions*
-Falcon comes with a monitoring plugin that registers catalog partition. This comes in really
handy during migration from filesystem based feeds to hcatalog based feeds.
-This plugin enables the user to de-couple the partition registration and assume that all
partitions are already on hcatalog even before the migration, simplifying the hcatalog migration.
-
-By default this plugin is disabled.
-To enable this plugin and leverage the feature, there are 3 pre-requisites:
+---+++Step 1 - Clone the Falcon repository
 
 <verbatim>
-In {package dir}/conf/startup.properties, add
-*.workflow.execution.listeners=org.apache.falcon.catalog.CatalogPartitionHandler
-
-In the cluster definition, ensure registry endpoint is defined.
-Ex:
-<interface type="registry" endpoint="thrift://localhost:1109" version="0.13.3"/>
-
-In the feed definition, ensure the corresponding catalog table is mentioned in feed-properties
-Ex:
-<properties>
-    <property name="catalog.table" value="catalog:default:in_table#year={YEAR};month={MONTH};day={DAY};hour={HOUR};minute={MINUTE}"/>
-</properties>
+$git clone https://git-wip-us.apache.org/repos/asf/falcon.git falcon
 </verbatim>
 
-*NOTE for Mac OS users*
-<verbatim>
-If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS (explained above).
-
-In  {package dir}/conf/falcon-env.sh uncomment the following line
-#export FALCON_SERVER_OPTS=
 
-and change it to look as below
-export FALCON_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
-</verbatim>
+---+++Step 2 - Build Falcon
 
-*Starting Falcon Server*
 <verbatim>
-bin/falcon-start [-port <port>]
+$cd falcon
+$export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m -noverify" && mvn clean install
 </verbatim>
+It builds and installs the package into the local repository, for use as a dependency in
other projects locally.
 
-By default,
-* If falcon.enableTLS is set to true explicitly or not set at all, falcon starts at port
15443 on https:// by default.
-* If falcon.enableTLS is set to false explicitly, falcon starts at port 15000 on http://.
-* To change the port, use -port option.
-   * If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically
put falcon on https://. Any other port will put falcon on http://.
-* falcon server starts embedded active mq. To control this behaviour, set the following system
properties using -D option in environment variable FALCON_OPTS:
-   * falcon.embeddedmq=<true/false> - Should server start embedded active mq, default
true
-   * falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616
-   * falcon.embeddedmq.data=<path> - Data path for embedded active mq, default {package
dir}/logs/data
-* falcon server starts with conf from {package dir}/conf. To override this (to use the same
conf with multiple falcon upgrades), set environment variable FALCON_CONF to the path of conf
dir
+[optionally -Dhadoop.version=<<hadoop.version>> can be appended to build for
a specific version of Hadoop]
 
-__Adding Extension Libraries__
-Library extensions allows users to add custom libraries to entity lifecycles such as feed
retention, feed replication and process execution. This is useful for usecases such as adding
filesystem extensions. To enable this, add the following configs to startup.properties:
-*.libext.paths=<paths to be added to all entity lifecycles>
-*.libext.feed.paths=<paths to be added to all feed lifecycles>
-*.libext.feed.retentions.paths=<paths to be added to feed retention workflow>
-*.libext.feed.replication.paths=<paths to be added to feed replication workflow>
-*.libext.process.paths=<paths to be added to process workflow>
+*NOTE:* Falcon drops support for Hadoop-1 and only supports Hadoop-2 from Falcon 0.6 onwards
+[optionally -Doozie.version=<<oozie version>> can be appended to build with a
specific version of Oozie. Oozie versions
+>= 4 are supported]
+NOTE: Falcon builds with JDK 1.7 using -noverify option
 
-The configured jars are added to falcon classpath and the corresponding workflows
 
 
-*Starting Prism*
-<verbatim>
-bin/prism-start [-port <port>]
-</verbatim>
+---+++Step 3 - Package and Deploy Falcon
 
-By default, 
-* prism server starts at port 16443. To change the port, use -port option
-   * falcon.enableTLS can be set to true or false explicitly to enable SSL, if not port that
end with 443 will automatically put prism on https://
-* prism starts with conf from {package dir}/conf. To override this (to use the same conf
with multiple prism upgrades), set environment variable FALCON_CONF to the path of conf dir
+Once the build successfully completes, artifacts can be packaged for deployment using the
assembly plugin. The Assembly
+Plugin for Maven is primarily intended to allow users to aggregate the project output along
with its dependencies,
+modules, site documentation, and other files into a single distributable archive. There are
two basic ways in which you
+can deploy Falcon - Embedded mode(also known as Stand Alone Mode) and Distributed mode. Your
next steps will vary based
+on the mode in which you want to deploy Falcon.
 
-*Using Falcon*
-<verbatim>
-bin/falcon admin -version
-Falcon server build version: {Version:"0.3-SNAPSHOT-rd7e2be9afa2a5dc96acd1ec9e325f39c6b2f17f7",Mode:"embedded"}
+*NOTE* : Oozie is being extended by Falcon (particularly on el-extensions) and hence the
need for Falcon to build &
+re-package Oozie, so that users of Falcon can work with the right Oozie setup. Though Oozie
is packaged by Falcon, it
+needs to be deployed separately by the administrator and is not auto deployed along with
Falcon.
 
-----
 
-bin/falcon help
-(for more details about falcon cli usage)
-</verbatim>
+---++++Embedded/Stand Alone Mode
+Embedded mode is useful when the Hadoop jobs and relevant data processing involve only one
Hadoop cluster. In this mode
+ there is a single Falcon server that contacts the scheduler to schedule jobs on Hadoop.
All the process/feed requests
+ like submit, schedule, suspend, kill etc. are sent to this server. For running Falcon in
this mode one should use the
+ Falcon which has been built using standalone option. You can find the instructions for Embedded
mode setup
+ [[Embedded-mode][here]].
 
-*Dashboard*
 
-Once falcon / prism is started, you can view the status of falcon entities using the Web-based
dashboard. The web UI works in both distributed and embedded mode. You can open your browser
at the corresponding port to use the web UI.
-
-Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not
exist on your falcon and oozie servers, please create the user.
-
-<verbatim>
-## create user.
-[root@falconhost ~] useradd -U -m falcon-dashboard -G users
-
-## verify user is created with membership in correct groups.
-[root@falconhost ~] groups falcon-dashboard
-falcon-dashboard : falcon-dashboard users
-[root@falconhost ~]
-</verbatim>
+---++++Distributed Mode
+Distributed mode is for multiple (colos) instances of Hadoop clusters, and multiple workflow
schedulers to handle them.
+In this mode Falcon has 2 components: Prism and Server(s). Both Prism and Server(s) have
their own their own config
+locations(startup and runtime properties). In this mode Prism acts as a contact point for
Falcon servers. While
+ all commands are available through Prism, only read and instance api's are available through
Server. You can find the
+ instructions for Distributed Mode setup [[Distributed-mode][here]].
 
-*Stopping Falcon Server*
-<verbatim>
-bin/falcon-stop
-</verbatim>
 
-*Stopping Prism*
-<verbatim>
-bin/prism-stop
-</verbatim>
 
----+++ Preparing Oozie and Falcon packages for deployment
+---+++Preparing Oozie and Falcon packages for deployment
 <verbatim>
-cd <<project home>>
-src/bin/package.sh <<hadoop-version>> <<oozie-version>>
+$cd <<project home>>
+$src/bin/package.sh <<hadoop-version>> <<oozie-version>>
 
 >> ex. src/bin/package.sh 1.1.2 4.0.1 or src/bin/package.sh 0.20.2-cdh3u5 4.0.1
 >> ex. src/bin/package.sh 2.5.0 4.0.0
 >> Falcon package is available in <<falcon home>>/target/apache-falcon-<<version>>-bin.tar.gz
 >> Oozie package is available in <<falcon home>>/target/oozie-4.0.1-distro.tar.gz
 </verbatim>
-
----+++ Running Examples using embedded package
-<verbatim>
-bin/falcon-start
-</verbatim>
-Make sure the hadoop and oozie endpoints are according to your setup in examples/entity/filesystem/standalone-cluster.xml
-The cluster locations,staging and working dirs, MUST be created prior to submitting a cluster
entity to Falcon.
-*staging* must have 777 permissions and the parent dirs must have execute permissions
-*working* must have 755 permissions and the parent dirs must have execute permissions
-<verbatim>
-bin/falcon entity -submit -type cluster -file examples/entity/filesystem/standalone-cluster.xml
-</verbatim>
-Submit input and output feeds:
-<verbatim>
-bin/falcon entity -submit -type feed -file examples/entity/filesystem/in-feed.xml
-bin/falcon entity -submit -type feed -file examples/entity/filesystem/out-feed.xml
-</verbatim>
-Set-up workflow for the process:
-<verbatim>
-hadoop fs -put examples/app /
-</verbatim>
-Submit and schedule the process:
-<verbatim>
-bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/oozie-mr-process.xml
-bin/falcon entity -submitAndSchedule -type process -file examples/entity/filesystem/pig-process.xml
-</verbatim>
-Generate input data:
-<verbatim>
-examples/data/generate.sh <<hdfs endpoint>>
-</verbatim>
-Get status of instances:
-<verbatim>
-bin/falcon instance -status -type process -name oozie-mr-process -start 2013-11-15T00:05Z
-end 2013-11-15T01:00Z
-</verbatim>
-
-HCat based example entities are in examples/entity/hcat.
-
-


Mime
View raw message