Return-Path: X-Original-To: apmail-falcon-commits-archive@minotaur.apache.org Delivered-To: apmail-falcon-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 609C017EED for ; Wed, 18 Feb 2015 10:56:11 +0000 (UTC) Received: (qmail 84517 invoked by uid 500); 18 Feb 2015 10:56:06 -0000 Delivered-To: apmail-falcon-commits-archive@falcon.apache.org Received: (qmail 84416 invoked by uid 500); 18 Feb 2015 10:56:06 -0000 Mailing-List: contact commits-help@falcon.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@falcon.apache.org Delivered-To: mailing list commits@falcon.apache.org Received: (qmail 84047 invoked by uid 99); 18 Feb 2015 10:56:06 -0000 Received: from eris.apache.org (HELO hades.apache.org) (140.211.11.105) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Feb 2015 10:56:06 +0000 Received: from hades.apache.org (localhost [127.0.0.1]) by hades.apache.org (ASF Mail Server at hades.apache.org) with ESMTP id D8EAAAC0175 for ; Wed, 18 Feb 2015 10:56:05 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1660589 [10/14] - in /falcon: site/ site/0.3-incubating/ site/0.3-incubating/docs/ site/0.3-incubating/docs/restapi/ site/0.4-incubating/ site/0.4-incubating/docs/ site/0.4-incubating/docs/restapi/ site/0.5-incubating/ site/0.5-incubating/... Date: Wed, 18 Feb 2015 10:56:00 -0000 To: commits@falcon.apache.org From: sriksun@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20150218105605.D8EAAAC0175@hades.apache.org> Modified: falcon/site/FalconCLI.html URL: http://svn.apache.org/viewvc/falcon/site/FalconCLI.html?rev=1660589&r1=1660588&r2=1660589&view=diff ============================================================================== --- falcon/site/FalconCLI.html (original) +++ falcon/site/FalconCLI.html Wed Feb 18 10:55:56 2015 @@ -1,13 +1,13 @@ - + Falcon - FalconCLI @@ -245,7 +245,7 @@ -
  • Last Published: 2015-01-11
  • +
  • Last Published: 2015-02-18
  • @@ -290,13 +290,18 @@

    Summary

    Summary of entities of a particular type and a cluster will be listed. Entity summary has N most recent instances of entity.

    -

    Usage: $FALCON_HOME/bin/falcon entity -type [cluster|feed|process] -summary

    +

    Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -summary

    Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -fields <<field1,field2>> -filterBy <<field1:value1,field2:value2>> -tags <<tagkey=tagvalue,tagkey=tagvalue>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10 -numInstances 7

    Optional params described here.

    Update

    Update operation allows an already submitted/scheduled entity to be updated. Cluster update is currently not allowed.

    -

    Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -update [-effective <<effective time>>]

    +

    Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -update -file <<path_to_file>>

    +

    Example: $FALCON_HOME/bin/falcon entity -type process -name HourlyReportsGenerator -update -file /process/definition.xml

    +
    +

    Touch

    +

    Force Update operation allows an already submitted/scheduled entity to be updated.

    +

    Usage: $FALCON_HOME/bin/falcon entity -type [feed|process] -name <<name>> -touch

    Status

    Status returns the current status of the entity.

    @@ -314,8 +319,7 @@

    Kill

    Kill sub-command is used to kill all the instances of the specified process whose nominal time is between the given start time and end time.

    -

    Note: 1. For all the instance management sub-commands, if end time is not specified, Falcon will perform the actions on all the instances whose instance time falls after the start time.

    -

    2. The start time and end time needs to be specified in TZ format. Example: 01 Jan 2012 01:00 => 2012-01-01T01:00Z

    +

    Note: 1. The start time and end time needs to be specified in TZ format. Example: 01 Jan 2012 01:00 => 2012-01-01T01:00Z

    3. Process name is compulsory parameter for each instance management command.

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -kill -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

    @@ -324,19 +328,19 @@

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -suspend -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

    Continue

    -

    Continue option is used to continue the failed workflow instance. This option is valid only for process instances in terminal state, i.e. SUCCEDDED, KILLED or FAILED.

    -

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -re-run -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

    +

    Continue option is used to continue the failed workflow instance. This option is valid only for process instances in terminal state, i.e. KILLED or FAILED.

    +

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -continue -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

    Rerun

    Rerun option is used to rerun instances of a given process. This option is valid only for process instances in terminal state, i.e. SUCCEDDED, KILLED or FAILED. Optionally, you can specify the properties to override.

    -

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -re-run -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-file <<properties file>>]

    +

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -rerun -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" [-file <<properties file>>]

    Resume

    Resume option is used to resume any instance that is in suspended state.

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -resume -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"

    Status

    -

    Status option via CLI can be used to get the status of a single or multiple instances. If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state. Along with the status of the instance time is also returned. Log location gives the oozie workflow url If the instance is in WAITING state, missing dependencies are listed

    +

    Status option via CLI can be used to get the status of a single or multiple instances. If the instance is not yet materialized but is within the process validity range, WAITING is returned as the state. Along with the status of the instance time is also returned. Log location gives the oozie workflow url If the instance is in WAITING state, missing dependencies are listed. The job urls are populated for all actions of user workflow and non-succeeded actions of the main-workflow. The user then need not go to the underlying scheduler to get the job urls when needed to debug an issue in the job.

    Example : Suppose a process has 3 instance, one has succeeded,one is in running state and other one is waiting, the expected output is:

    {"status":"SUCCEEDED","message":"getStatus is successful","instances":[{"instance":"2012-05-07T05:02Z","status":"SUCCEEDED","logFile":"http://oozie-dashboard-url"},{"instance":"2012-05-07T05:07Z","status":"RUNNING","logFile":"http://oozie-dashboard-url"}, {"instance":"2010-01-02T11:05Z","status":"WAITING"}]

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -status

    @@ -365,6 +369,12 @@

    Optional Args : -colo <<colo>> -lifecycle <<lifecycles>> -filterBy <<field1:value1,field2:value2>> -orderBy <<field>> -sortOrder <<sortOrder>> -offset 0 -numResults 10

    Optional params described here.

    +

    FeedInstanceListing

    +

    Get falcon feed instance availability.

    +

    Usage: $FALCON_HOME/bin/falcon instance -entity feed -name <<name>> -listing

    +

    Optional Args : -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'" -colo <<colo>>

    +

    Optional params described here.

    +

    Logs

    Get logs for instance actions

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -logs

    @@ -379,27 +389,44 @@

    Displays the workflow params of a given instance. Where start time is considered as nominal time of that instance.

    Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -params -start "yyyy-MM-dd'T'HH:mm'Z'"

    -

    Graphs Options

    +

    Metadata Lineage Options

    +
    +

    Lineage

    +

    dot format. You can use the output and view a graphical representation of DAG using an online graphviz viewer like this.

    +

    Usage:

    +

    $FALCON_HOME/bin/falcon metadata -lineage -pipeline my-pipeline

    +

    pipeline is a mandatory option.

    Vertex

    Get the vertex with the specified id.

    -

    Usage: $FALCON_HOME/bin/falcon graph -vertex -id <<id>>

    -

    Example: $FALCON_HOME/bin/falcon graph -vertex -id 4

    +

    Usage: $FALCON_HOME/bin/falcon metadata -vertex -id <<id>>

    +

    Example: $FALCON_HOME/bin/falcon metadata -vertex -id 4

    Vertices

    Get all vertices for a key index given the specified value.

    -

    Usage: $FALCON_HOME/bin/falcon graph -vertices -key <<key>> -value <<value>>

    -

    Example: $FALCON_HOME/bin/falcon graph -vertices -key type -value feed-instance

    +

    Usage: $FALCON_HOME/bin/falcon metadata -vertices -key <<key>> -value <<value>>

    +

    Example: $FALCON_HOME/bin/falcon metadata -vertices -key type -value feed-instance

    Vertex Edges

    Get the adjacent vertices or edges of the vertex with the specified direction.

    -

    Usage: $FALCON_HOME/bin/falcon graph -edges -id <<vertex-id>> -direction <<direction>>

    -

    Example: $FALCON_HOME/bin/falcon graph -edges -id 4 -direction both $FALCON_HOME/bin/falcon graph -edges -id 4 -direction inE

    +

    Usage: $FALCON_HOME/bin/falcon metadata -edges -id <<vertex-id>> -direction <<direction>>

    +

    Example: $FALCON_HOME/bin/falcon metadata -edges -id 4 -direction both $FALCON_HOME/bin/falcon metadata -edges -id 4 -direction inE

    Edge

    Get the edge with the specified id.

    -

    Usage: $FALCON_HOME/bin/falcon graph -edge -id <<id>>

    -

    Example: $FALCON_HOME/bin/falcon graph -edge -id Q9n-Q-5g

    +

    Usage: $FALCON_HOME/bin/falcon metadata -edge -id <<id>>

    +

    Example: $FALCON_HOME/bin/falcon metadata -edge -id Q9n-Q-5g

    +
    +

    Metadata Discovery Options

    +
    +

    List

    +

    Lists of all dimensions of given type. If the user provides optional param cluster, only the dimensions related to the cluster are listed. Usage: $FALCON_HOME/bin/falcon metadata -list -type [cluster_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines]

    +

    Optional Args : -cluster <<cluster name>>

    +

    Example: $FALCON_HOME/bin/falcon metadata -list -type process_entity -cluster primary-cluster $FALCON_HOME/bin/falcon metadata -list -type tags

    +
    +

    Relations

    +

    List all dimensions related to specified Dimension identified by dimension-type and dimension-name. Usage: $FALCON_HOME/bin/falcon metadata -relations -type [cluster_entity|feed_entity|process_entity|user|colo|tags|groups|pipelines] -name <<Dimension Name>>

    +

    Example: $FALCON_HOME/bin/falcon metadata -relations -type process_entity -name sample-process

    Admin Options

    @@ -411,6 +438,14 @@

    Status

    Status returns the current state of Falcon (running or stopped). Usage: $FALCON_HOME/bin/falcon admin -status

    +
    +

    Recipe Options

    +
    +

    Submit Recipe

    +

    Submit the specified recipe.

    +

    Usage: $FALCON_HOME/bin/falcon recipe -name <name> Name of the recipe. User should have defined <name>-template.xml and <name>.properties in the path specified by falcon.recipe.path in client.properties file. falcon.home path is used if its not specified in client.properties file. If its not specified in client.properties file and also if files cannot be found at falcon.home, Falcon CLI will fail.

    +

    Optional Args : -tool <recipeToolClassName> Falcon provides a base tool that recipes can override. If this option is not specified the default Recipe Tool RecipeTool defined is used. This option is required if user defines his own recipe tool class.

    +

    Example: $FALCON_HOME/bin/falcon recipe -name hdfs-replication

    Modified: falcon/site/FalconDocumentation.html URL: http://svn.apache.org/viewvc/falcon/site/FalconDocumentation.html?rev=1660589&r1=1660588&r2=1660589&view=diff ============================================================================== --- falcon/site/FalconDocumentation.html (original) +++ falcon/site/FalconDocumentation.html Wed Feb 18 10:55:56 2015 @@ -1,13 +1,13 @@ - + Falcon - Contents @@ -245,7 +245,7 @@ -
  • Last Published: 2015-01-11
  • +
  • Last Published: 2015-02-18
  • @@ -269,10 +269,12 @@
  • Updating process and feed definition
  • Handling late input data
  • Idempotency
  • -
  • Alerting and Monitoring
  • Falcon EL Expressions
  • Lineage
  • -
  • Security
  • +
  • Security
  • +
  • Recipes
  • +
  • Monitoring
  • +
  • Backwards Compatibility Instructions
  • Architecture

    @@ -298,10 +300,10 @@

    There are two basic components of Falcon set up. Falcon Prism and Falcon Server. As the name suggests Falcon Prism splits the request it gets to the Falcon Servers. More details below:

    Stand Alone Mode

    -

    Stand alone mode is useful when the hadoop jobs and relevant data processing involves only one hadoop cluster. In this mode there is single Falcon server that contacts with oozie to schedule jobs on Hadoop. All the process / feed request like submit, schedule, suspend, kill are sent to this server only. For running in this mode one should use the falcon which has been built for standalone mode, or build using standalone option if using source code.

    +

    Stand alone mode is useful when the hadoop jobs and relevant data processing involves only one hadoop cluster. In this mode there is a single Falcon server that contacts Oozie to schedule jobs on Hadoop. All the process/feed requests like submit, schedule, suspend, kill etc. are sent to this server. For running falcon in this mode one should use the falcon which has been built using standalone option.

    Distributed Mode

    -

    Distributed mode is the mode which you might me using most of the time. This is for organisations which have multiple instances of hadoop clusters, and multiple workflow schedulers to handle them. Here we have 2 components: Prism and Server. Both Prism and server have there own setup (runtime and startup properties) and there config locations. In this mode Prism acts as a contact point for Falcon servers. Below are the requests that can be sent to prism and server in this mode:

    +

    Distributed mode is for multiple (colos) instances of hadoop clusters, and multiple workflow schedulers to handle them. In this mode falcon has 2 components: Prism and Server(s). Both Prism and servers have their own setup (runtime and startup properties) and their own config locations. In this mode Prism acts as a contact point for Falcon servers. While all commands are available through Prism, only read and instance api's are available through Server. Below are the requests that can be sent to each of these:

    Prism: submit, schedule, submitAndSchedule, Suspend, Resume, Kill, instance management Server: schedule, suspend, resume, instance management

    As observed above submit and kill are kept exclusively as Prism operations to keep all the config stores in sync and to support feature of idempotency. Request may also be sent from prism but directed to a specific server using the option "-colo" from CLI or append the same in web request, if using API.

    When a cluster is submitted it is by default sent to all the servers configured in the prism. When is feed is SUBMIT / SCHEDULED request is only sent to the servers specified in the feed / process definitions. Servers are mentioned in the feed / process via CLUSTER tags in xml definition.

    @@ -389,13 +391,10 @@ catalog:$database-name:$table-name#parti

    Delete operation on the entity removes any scheduled activity on the workflow engine, besides removing the entity from the falcon configuration store. Delete operation on an entity would only succeed if there are no dependent entities on the deleted entity.

    Update

    -

    Update operation allows an already submitted/scheduled entity to be updated. Cluster update is currently not allowed. Feed update can cause cascading update to all the processes already scheduled. Process update triggers update in falcon if entity is updated/the user specified workflow/lib is updated. The following set of actions are performed in Oozie to realize an update:

    +

    Update operation allows an already submitted/scheduled entity to be updated. Cluster update is currently not allowed. Feed update can cause cascading update to all the processes already scheduled. Process update triggers update in falcon if entity is updated. The following set of actions are performed in scheduler to realize an update:

      -
    • Suspend the previously scheduled Oozie coordinator. This is to prevent any new action from being triggered.
    • -
    • Update the coordinator to set the end time to "now"
    • -
    • Resume the suspended coordinators
    • -
    • Schedule as per the new process/feed definition with the start time as "now"
    -

    Update optionally takes effective time as a parameter which is used as the end time of previously scheduled coordinator. So, the updated configuration will be effective since the given timestamp.

    +
  • Update the old scheduled entity to set the end time to "now"
  • +
  • Schedule as per the new process/feed definition with the start time as "now"
  • Instance Management actions

    Instance Manager gives user the option to control individual instances of the process based on their instance start time (start time of that instance). Start time needs to be given in standard TZ format. Example: 01 Jan 2012 01:00 => 2012-01-01T01:00Z

    @@ -442,7 +441,7 @@ catalog:$database-name:$table-name#parti

    With the integration of Hive, Falcon also provides retention for tables in Hive catalog.

    Example:

    -

    If retention period is 10 hours, and the policy kicks in at time 't', the data retained by system is essentially the one falling in between [t-10h,t]. Any data in the boundaries [-�,t-10h) and (t,�] is removed from the system.

    +

    If retention period is 10 hours, and the policy kicks in at time 't', the data retained by system is essentially the one in range [t-10h, t]. Any data before t-10h and after t is removed from the system.

    The 'action' attribute can attain values of DELETE/ARCHIVE. Based upon the tag value, the data eligible for removal is either deleted/archived.

    NOTE: Falcon 0.1/0.2 releases support Delete operation only

    @@ -520,6 +519,26 @@ catalog:$database-name:$table-name#parti
    • The partition is not complete and hence not visible to users until all the data is committed on the secondary
    cluster, (no dirty reads)
    +

    Archival as Replication

    +

    Falcon allows users to archive data from on-premice to cloud, either Azure WASB or S3. It uses the underlying replication for archiving data from source to target. The archival URI is specified as the overridden location for the target cluster.

    +

    Example:

    +
    +
    +    <clusters>
    +        <cluster name="on-premise-cluster" type="source">
    +            <validity start="2021-11-01T00:00Z" end="2021-12-31T00:00Z"/>
    +        </cluster>
    +        <cluster name="cloud-cluster" type="target">
    +            <validity start="2011-11-01T00:00Z" end="2011-12-31T00:00Z"/>
    +            <locations>
    +                <location type="data"
    +                          path="wasb://test@blah.blob.core.windows.net/data/${YEAR}-${MONTH}-${DAY}-${HOUR}"/>
    +            </locations>
    +        </cluster>
    +    </clusters>
    +
    +
    +

    Relation between feed's retention limit and feed's late arrival cut off period:

    For reasons that are obvious, Falcon has an external validation that ensures that the user always specifies the feed retention limit to be more than the feed's allowed late arrival period. If this rule is violated by the user, the feed submission call itself throws back an error.

    @@ -656,58 +675,6 @@ validity start="2009-01-01T00:00Z&q

    Idempotency

    All the operations in Falcon are Idempotent. That is if you make same request to the falcon server / prism again you will get a SUCCESSFUL return if it was SUCCESSFUL in the first attempt. For example, you submit a new process / feed and get SUCCESSFUL message return. Now if you run the same command / api request on same entity you will again get a SUCCESSFUL message. Same is true for other operations like schedule, kill, suspend and resume. Idempotency also by takes care of the condition when request is sent through prism and fails on one or more servers. For example prism is configured to send request to 3 servers. First user sends a request to SUBMIT a process on all 3 of them, and receives a response SUCCESSFUL from all of them. Then due to some issue one of the servers goes down, and user send a request to schedule the submitted process. This time he will receive a response with PARTIAL status and a FAILURE message from the server that has gone down. If the users check he wi ll find the process would have been started and running on the 2 SUCCESSFUL servers. Now the issue with server is figured out and it is brought up. Sending the SCHEDULE request again through prism will result in a SUCCESSFUL response from prism as well as other three servers, but this time PROCESS will be SCHEDULED only on the server which had failed earlier and other two will keep running as before.

    -

    Alerting and Monitoring

    -
    -

    Alerting

    -

    Falcon provides monitoring of various events by capturing metrics of those events. The metric numbers can then be used to monitor performance and health of the Falcon system and the entire processing pipelines.

    -

    Users can view the logs of these events in the metric.log file, by default this file is created under ${user.dir}/logs/ directory. Users may also extend the Falcon monitoring framework to send events to systems like Mondemand/lwes.

    -

    The following events are captured by Falcon for logging the metrics:

    -
      -
    1. New cluster definitions posted to Falcon (success & failures)
    2. -
    3. New feed definition posted to Falcon (success & failures)
    4. -
    5. New process definition posted to Falcon (success & failures)
    6. -
    7. Process update events (success & failures)
    8. -
    9. Feed update events (success & failures)
    10. -
    11. Cluster update events (success & failures)
    12. -
    13. Process suspend events (success & failures)
    14. -
    15. Feed suspend events (success & failures)
    16. -
    17. Process resume events (success & failures)
    18. -
    19. Feed resume events (success & failures)
    20. -
    21. Process remove events (success & failures)
    22. -
    23. Feed remove events (success & failures)
    24. -
    25. Cluster remove events (success & failures)
    26. -
    27. Process instance kill events (success & failures)
    28. -
    29. Process instance re-run events (success & failures)
    30. -
    31. Process instance generation events
    32. -
    33. Process instance failure events
    34. -
    35. Process instance auto-retry events
    36. -
    37. Process instance retry exhaust events
    38. -
    39. Feed instance deletion event
    40. -
    41. Feed instance deletion failure event (no retries)
    42. -
    43. Feed instance replication event
    44. -
    45. Feed instance replication failure event
    46. -
    47. Feed instance replication auto-retry event
    48. -
    49. Feed instance replication retry exhaust event
    50. -
    51. Feed instance late arrival event
    52. -
    53. Feed instance post cut-off arrival event
    54. -
    55. Process re-run due to late feed event
    56. -
    57. Transaction rollback failed event
    -

    The metric logged for an event has the following properties:

    -
      -
    1. Action - Name of the event.
    2. -
    3. Dimensions - A list of name/value pairs of various attributes for a given action.
    4. -
    5. Status- Status of an action FAILED/SUCCEEDED.
    6. -
    7. Time-taken - Time taken in nanoseconds for a given action.
    -

    An example for an event logged for a submit of a new process definition:

    -

    2012-05-04 12:23:34,026 {Action:submit, Dimensions:{entityType=process}, Status: SUCCEEDED, Time-taken:97087000 ns}

    -

    Users may parse the metric.log or capture these events from custom monitoring frameworks and can plot various graphs or send alerts according to their requirements.

    -
    -

    Notifications

    -

    Falcon creates a JMS topic for every process/feed that is scheduled in Falcon. The implementation class and the broker url of the JMS engine are read from the dependent cluster's definition. Users may register consumers on the required topic to check the availability or status of feed instances.

    -

    For a given process that is scheduled, the name of the topic is same as the process name. Falcon sends a Map message for every feed produced by the instance of a process to the JMS topic. The JMS MapMessage sent to a topic has the following properties: entityName, feedNames, feedInstancePath, workflowId, runId, nominalTime, timeStamp, brokerUrl, brokerImplClass, entityType, operation, logFile, topicName, status, brokerTTL;

    -

    For a given feed that is scheduled, the name of the topic is same as the feed name. Falcon sends a map message for every feed instance that is deleted/archived/replicated depending upon the retention policy set in the feed definition. The JMS MapMessage sent to a topic has the following properties: entityName, feedNames, feedInstancePath, workflowId, runId, nominalTime, timeStamp, brokerUrl, brokerImplClass, entityType, operation, logFile, topicName, status, brokerTTL;

    -

    The JMS messages are automatically purged after a certain period (default 3 days) by the Falcon JMS house-keeping service.TTL (Time-to-live) for JMS message can be configured in the Falcon's startup.properties file.

    -

    Falcon EL Expressions

    Falcon expression language can be used in process definition for giving the start and end instance for various feeds.

    Before going into how to use falcon EL expressions it is necessary to understand what does instance and instance start time refer to with respect to Falcon.

    @@ -805,15 +772,21 @@ validity start="2009-01-01T00:00Z&q
     config name: *.application.services
     config value: org.apache.falcon.metadata.MetadataMappingService
    -<verbatim>
     
    -Lineage is only captured for Process executions. A future release will capture lineage for
    -lifecycle policies such as replication and retention.
    -
    ---++ Security
    -
    -Security is detailed in [[Security][Security]].
    -
    + +

    Lineage is only captured for Process executions. A future release will capture lineage for lifecycle policies such as replication and retention.

    +
    +

    Security

    +

    Security is detailed in Security.

    +
    +

    Recipes

    +

    Recipes is detailed in Recipes.

    +
    +

    Monitoring

    +

    Monitoring and Operationalizing Falcon is detailed in Operability.

    +
    +

    Backwards Compatibility

    +

    Backwards compatibility instructions are detailed here.

    Modified: falcon/site/HiveIntegration.html URL: http://svn.apache.org/viewvc/falcon/site/HiveIntegration.html?rev=1660589&r1=1660588&r2=1660589&view=diff ============================================================================== --- falcon/site/HiveIntegration.html (original) +++ falcon/site/HiveIntegration.html Wed Feb 18 10:55:56 2015 @@ -1,13 +1,13 @@ - + Falcon - Hive Integration @@ -245,7 +245,7 @@ -
  • Last Published: 2015-01-11
  • +
  • Last Published: 2015-02-18
  • @@ -288,8 +288,8 @@ catalog.service.impl=org.apache.falcon.c

    Hence, Falcon for Hive support needs Oozie 4.x.

    Oozie Shared Library setup

    -

    Falcon post Hive integration depends heavily on the shared library feature of Oozie. Since the sheer number of jars for HCatalog, Pig and Hive are in the many 10s in numbers, its quite daunting to redistribute the dependent jars from Falcon.

    -

    This is a one time effort in Oozie setup and is quite straightforward.

    +

    Falcon post Hive integration depends heavily on the shared library feature of Oozie. Since the sheer number of jars for HCatalog, Pig and Hive are in the many 10s in numbers, its quite daunting to redistribute the dependent jars from Falcon.

    +

    This is a one time effort in Oozie setup and is quite straightforward.

    Approach

    @@ -442,7 +442,7 @@ org.apache.hadoop.hive.ql.parse.ImportSe <interface type="execute" endpoint="localhost:10300" version="1.1.1" /> <interface type="workflow" endpoint="http://localhost:11010/oozie/" - version="3.3.0" /> + version="4.0.1" /> <interface type="registry" endpoint="thrift://localhost:19083" version="0.11.0" /> <interface type="messaging" endpoint="tcp://localhost:61616?daemon=true" @@ -475,7 +475,7 @@ org.apache.hadoop.hive.ql.parse.ImportSe <interface type="execute" endpoint="localhost:20300" version="1.1.1" /> <interface type="workflow" endpoint="http://localhost:11020/oozie/" - version="3.3.0" /> + version="4.0.1" /> <interface type="registry" endpoint="thrift://localhost:29083" version="0.11.0" /> <interface type="messaging" endpoint="tcp://localhost:61616?daemon=true" Modified: falcon/site/InstallationSteps.html URL: http://svn.apache.org/viewvc/falcon/site/InstallationSteps.html?rev=1660589&r1=1660588&r2=1660589&view=diff ============================================================================== --- falcon/site/InstallationSteps.html (original) +++ falcon/site/InstallationSteps.html Wed Feb 18 10:55:56 2015 @@ -1,13 +1,13 @@ - + Falcon - Building & Installing Falcon @@ -245,7 +245,7 @@ -
  • Last Published: 2015-01-11
  • +
  • Last Published: 2015-02-18
  • @@ -260,15 +260,22 @@

    Building Falcon

    +You would need the following installed to build Falcon
    +
    +* JDK 1.7
    +* Maven 3.x
    +
     git clone https://git-wip-us.apache.org/repos/asf/falcon.git falcon
     
     cd falcon
     
    -export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean install [For hadoop 1]
    -export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m" && mvn clean install -Phadoop-2 [For hadoop 2]
    +export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m -noverify" && mvn clean install
     
     [optionally -Dhadoop.version=<<hadoop.version>> can be appended to build for a specific version of hadoop]
    -[optionally -Doozie.version=<<oozie version>> can be appended to build with a specific version of oozie. Oozie versions >= 3.oozie-3.2.0-incubating are supported]
    +*Note:* Falcon drops support for Hadoop-1 and only supports Hadoop-2 from Falcon 0.6 onwards
    +[optionally -Doozie.version=<<oozie version>> can be appended to build with a specific version of oozie.
    +Oozie versions >= 4 are supported]
    +Falcon build with JDK 1.7 using -noverify option
     
     
     
    @@ -277,12 +284,11 @@ export MAVEN_OPTS="-Xmx1024m -XX:Ma
     
    -mvn clean assembly:assembly -DskipTests -DskipCheck=true [For hadoop 1]
    -mvn clean assembly:assembly -DskipTests -DskipCheck=true -P hadoop-2 [For hadoop 2]
    +mvn clean assembly:assembly -DskipTests -DskipCheck=true
     
     
     
    -

    Tar can be found in {project dir}/target/falcon-${project.version}-bin.tar.gz

    +

    Tar can be found in {project dir}/target/apache-falcon-${project.version}-bin.tar.gz

    Tar is structured as follows

    @@ -318,12 +324,11 @@ mvn clean assembly:assembly -DskipTests
     
     
    -mvn clean assembly:assembly -DskipTests -DskipCheck=true -Pdistributed,hadoop-1 [For hadoop 1]
    -mvn clean assembly:assembly -DskipTests -DskipCheck=true -Pdistributed,hadoop-2 [For hadoop 2]
    +mvn clean assembly:assembly -DskipTests -DskipCheck=true -Pdistributed,hadoop-2
     
     
     
    -

    Tar can be found in {project dir}/target/falcon-distributed-${project.version}-server.tar.gz

    +

    Tar can be found in {project dir}/target/apache-falcon-distributed-${project.version}-server.tar.gz

    Tar is structured as follows

    @@ -412,15 +417,27 @@ cd falcon-distributed-${project.version}
     #export FALCON_EXPANDED_WEBAPP_DIR=
     
     
    +

    NOTE for Mac OS users

    +
    +
    +If you are using a Mac OS, you will need to configure the FALCON_SERVER_OPTS (explained above).
    +
    +In  {package dir}/conf/falcon-env.sh uncomment the following line
    +#export FALCON_SERVER_OPTS=
    +
    +and change it to look as below
    +export FALCON_SERVER_OPTS="-Djava.awt.headless=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
    +
    +

    Starting Falcon Server

     bin/falcon-start [-port <port>]
     
     
    -

    By default, * falcon server starts at port 15443 (https) by default . To change the port, use -port option

    +

    By default, * If falcon.enableTLS is set to true explicitly or not set at all, falcon starts at port 15443 on https:// by default. * If falcon.enableTLS is set to false explicitly, falcon starts at port 15000 on http://. * To change the port, use -port option.

      -
    • falcon.enableTLS can be set to true or false explicitly to enable SSL, if not port that end with 443 will automatically put falcon on https://
    * falcon server starts embedded active mq. To control this behaviour, set the following system properties using -D option in environment variable FALCON_OPTS: +
  • If falcon.enableTLS is not set explicitly, port that ends with 443 will automatically put falcon on https://. Any other port will put falcon on http://.
  • * falcon server starts embedded active mq. To control this behaviour, set the following system properties using -D option in environment variable FALCON_OPTS:
    • falcon.embeddedmq=<true/false> - Should server start embedded active mq, default true
    • falcon.embeddedmq.port=<port> - Port for embedded active mq, default 61616
    • @@ -450,6 +467,18 @@ bin/falcon help

    Dashboard

    Once falcon / prism is started, you can view the status of falcon entities using the Web-based dashboard. The web UI works in both distributed and embedded mode. You can open your browser at the corresponding port to use the web UI.

    +

    Falcon dashboard makes the REST api calls as user "falcon-dashboard". If this user does not exist on your falcon and oozie servers, please create the user.

    +
    +
    +## create user.
    +[root@falconhost ~] useradd -U -m falcon-dashboard -G users
    +
    +## verify user is created with membership in correct groups.
    +[root@falconhost ~] groups falcon-dashboard
    +falcon-dashboard : falcon-dashboard users
    +[root@falconhost ~]
    +
    +

    Stopping Falcon Server

    @@ -469,9 +498,10 @@ bin/prism-stop
     cd <<project home>>
     src/bin/package.sh <<hadoop-version>> <<oozie-version>>
     
    ->> ex. src/bin/package.sh 1.1.2 3.1.3-incubating or src/bin/package.sh 0.20.2-cdh3u5 4.0.0
    ->> Falcon package is available in <<falcon home>>/target/falcon-<<version>>-bin.tar.gz
    ->> Oozie package is available in <<falcon home>>/target/oozie-3.3.2-distro.tar.gz
    +>> ex. src/bin/package.sh 1.1.2 4.0.1 or src/bin/package.sh 0.20.2-cdh3u5 4.0.1
    +>> ex. src/bin/package.sh 2.5.0 4.0.0
    +>> Falcon package is available in <<falcon home>>/target/apache-falcon-<<version>>-bin.tar.gz
    +>> Oozie package is available in <<falcon home>>/target/oozie-4.0.1-distro.tar.gz
     
     
    @@ -481,7 +511,7 @@ src/bin/package.sh <<hadoop-versio bin/falcon-start
    -

    Make sure the hadoop and oozie endpoints are according to your setup in examples/entity/filesystem/standalone-cluster.xml

    +

    Make sure the hadoop and oozie endpoints are according to your setup in examples/entity/filesystem/standalone-cluster.xml The cluster locations,staging and working dirs, MUST be created prior to submitting a cluster entity to Falcon. staging must have 777 permissions and the parent dirs must have execute permissions working must have 755 permissions and the parent dirs must have execute permissions

     bin/falcon entity -submit -type cluster -file examples/entity/filesystem/standalone-cluster.xml
    
    Added: falcon/site/MigrationInstructions.html
    URL: http://svn.apache.org/viewvc/falcon/site/MigrationInstructions.html?rev=1660589&view=auto
    ==============================================================================
    --- falcon/site/MigrationInstructions.html (added)
    +++ falcon/site/MigrationInstructions.html Wed Feb 18 10:55:56 2015
    @@ -0,0 +1,293 @@
    +
    +
    +
    +  
    +    
    +    
    +    
    +    
    +    Falcon - Migration Instructions
    +    
    +    
    +    
    +
    +      
    +    
    +
    +                          
    +        
    +
    +          
    +            
    +        
    +          
    +                        
    +                    
    +                
    +
    +    
    +    
    +        
    + + + + + + +
    + +
    +

    Migration Instructions

    +
    +

    Migrate from 0.5-incubating to 0.6-incubating

    +

    This is a placeholder wiki for migration instructions from falcon 0.5-incubating to 0.6-incubating.

    +
    +

    Update Entities

    +
    +

    Change cluster dir permissions

    +
    +

    Enable/Disable TLS

    +
    +

    Authorization

    +
    +
    + +
    + + + + Modified: falcon/site/OnBoarding.html URL: http://svn.apache.org/viewvc/falcon/site/OnBoarding.html?rev=1660589&r1=1660588&r2=1660589&view=diff ============================================================================== --- falcon/site/OnBoarding.html (original) +++ falcon/site/OnBoarding.html Wed Feb 18 10:55:56 2015 @@ -1,13 +1,13 @@ - + Falcon - Contents @@ -245,7 +245,7 @@ -
  • Last Published: 2015-01-11
  • +
  • Last Published: 2015-02-18
  • @@ -276,7 +276,7 @@

    Sample Pipeline

    Cluster
    -

    Cluster definition that contains end points for name node, job tracker, oozie and jms server:

    +

    Cluster definition that contains end points for name node, job tracker, oozie and jms server: The cluster locations MUST be created prior to submitting a cluster entity to Falcon. staging must have 777 permissions and the parent dirs must have execute permissions working must have 755 permissions and the parent dirs must have execute permissions

     <?xml version="1.0"?>
    @@ -286,13 +286,13 @@
     <cluster colo="ua2" description="" name="corp" xmlns="uri:falcon:cluster:0.1"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">    
         <interfaces>
    -        <interface type="readonly" endpoint="hftp://name-node.com:50070" version="0.20.2-cdh3u0" />
    +        <interface type="readonly" endpoint="hftp://name-node.com:50070" version="2.5.0" />
     
    -        <interface type="write" endpoint="hdfs://name-node.com:54310" version="0.20.2-cdh3u0" />
    +        <interface type="write" endpoint="hdfs://name-node.com:54310" version="2.5.0" />
     
    -        <interface type="execute" endpoint="job-tracker:54311" version="0.20.2-cdh3u0" />
    +        <interface type="execute" endpoint="job-tracker:54311" version="2.5.0" />
     
    -        <interface type="workflow" endpoint="http://oozie.com:11000/oozie/" version="3.1.4" />
    +        <interface type="workflow" endpoint="http://oozie.com:11000/oozie/" version="4.0.1" />
     
             <interface type="messaging" endpoint="tcp://jms-server.com:61616?daemon=true" version="5.1.6" />
         </interfaces>
    
    Added: falcon/site/Operability.html
    URL: http://svn.apache.org/viewvc/falcon/site/Operability.html?rev=1660589&view=auto
    ==============================================================================
    --- falcon/site/Operability.html (added)
    +++ falcon/site/Operability.html Wed Feb 18 10:55:56 2015
    @@ -0,0 +1,343 @@
    +
    +
    +
    +  
    +    
    +    
    +    
    +    
    +    Falcon - Operationalizing Falcon
    +    
    +    
    +    
    +
    +      
    +    
    +
    +                          
    +        
    +
    +          
    +            
    +        
    +          
    +                        
    +                    
    +                
    +
    +    
    +    
    +        
    + + + + + + +
    + +
    +

    Operationalizing Falcon

    +
    +

    Overview

    +

    Apache Falcon provides various tools to operationalize Falcon consisting of Alerts for unrecoverable errors, Audits of user actions, Metrics, and Notifications. They are detailed below.

    +
    +

    Monitoring

    +

    Falcon provides monitoring of various events by capturing metrics of those events. The metric numbers can then be used to monitor performance and health of the Falcon system and the entire processing pipelines.

    +

    Users can view the logs of these events in the metric.log file, by default this file is created under ${user.dir}/logs/ directory. Users may also extend the Falcon monitoring framework to send events to systems like Mondemand/lwes by implementingorg.apache.falcon.plugin.MonitoringPlugin interface.

    +

    The following events are captured by Falcon for logging the metrics:

    +
      +
    1. New cluster definitions posted to Falcon (success & failures)
    2. +
    3. New feed definition posted to Falcon (success & failures)
    4. +
    5. New process definition posted to Falcon (success & failures)
    6. +
    7. Process update events (success & failures)
    8. +
    9. Feed update events (success & failures)
    10. +
    11. Cluster update events (success & failures)
    12. +
    13. Process suspend events (success & failures)
    14. +
    15. Feed suspend events (success & failures)
    16. +
    17. Process resume events (success & failures)
    18. +
    19. Feed resume events (success & failures)
    20. +
    21. Process remove events (success & failures)
    22. +
    23. Feed remove events (success & failures)
    24. +
    25. Cluster remove events (success & failures)
    26. +
    27. Process instance kill events (success & failures)
    28. +
    29. Process instance re-run events (success & failures)
    30. +
    31. Process instance generation events
    32. +
    33. Process instance failure events
    34. +
    35. Process instance auto-retry events
    36. +
    37. Process instance retry exhaust events
    38. +
    39. Feed instance deletion event
    40. +
    41. Feed instance deletion failure event (no retries)
    42. +
    43. Feed instance replication event
    44. +
    45. Feed instance replication failure event
    46. +
    47. Feed instance replication auto-retry event
    48. +
    49. Feed instance replication retry exhaust event
    50. +
    51. Feed instance late arrival event
    52. +
    53. Feed instance post cut-off arrival event
    54. +
    55. Process re-run due to late feed event
    56. +
    57. Transaction rollback failed event
    +

    The metric logged for an event has the following properties:

    +
      +
    1. Action - Name of the event.
    2. +
    3. Dimensions - A list of name/value pairs of various attributes for a given action.
    4. +
    5. Status- Status of an action FAILED/SUCCEEDED.
    6. +
    7. Time-taken - Time taken in nanoseconds for a given action.
    +

    An example for an event logged for a submit of a new process definition:

    +

    2012-05-04 12:23:34,026 {Action:submit, Dimensions:{entityType=process}, Status: SUCCEEDED, Time-taken:97087000 ns}

    +

    Users may parse the metric.log or capture these events from custom monitoring frameworks and can plot various graphs or send alerts according to their requirements.

    +
    +

    Notifications

    +

    Falcon creates a JMS topic for every process/feed that is scheduled in Falcon. The implementation class and the broker url of the JMS engine are read from the dependent cluster's definition. Users may register consumers on the required topic to check the availability or status of feed instances.

    +

    For a given process that is scheduled, the name of the topic is same as the process name. Falcon sends a Map message for every feed produced by the instance of a process to the JMS topic. The JMS MapMessage sent to a topic has the following properties: entityName, feedNames, feedInstancePath, workflowId, runId, nominalTime, timeStamp, brokerUrl, brokerImplClass, entityType, operation, logFile, topicName, status, brokerTTL;

    +

    For a given feed that is scheduled, the name of the topic is same as the feed name. Falcon sends a map message for every feed instance that is deleted/archived/replicated depending upon the retention policy set in the feed definition. The JMS MapMessage sent to a topic has the following properties: entityName, feedNames, feedInstancePath, workflowId, runId, nominalTime, timeStamp, brokerUrl, brokerImplClass, entityType, operation, logFile, topicName, status, brokerTTL;

    +

    The JMS messages are automatically purged after a certain period (default 3 days) by the Falcon JMS house-keeping service.TTL (Time-to-live) for JMS message can be configured in the Falcon's startup.properties file.

    +
    +

    Alerts

    +

    Falcon generates alerts for unrecoverable errors into a log file by default. Users can view these alerts in the alerts.log file, by default this file is created under ${user.dir}/logs/ directory.

    +

    Users may also extend the Falcon Alerting plugin to send events to systems like Nagios, etc. by extending org.apache.falcon.plugin.AlertingPlugin interface.

    +
    +

    Audits

    +

    Falcon audits all user activity and captures them into a log file by default. Users can view these audits in the audit.log file, by default this file is created under ${user.dir}/logs/ directory.

    +

    Users may also extend the Falcon Audit plugin to send audits to systems like Apache Argus, etc. by extending org.apache.falcon.plugin.AuditingPlugin interface.

    +
    +
    + +
    + + + + Modified: falcon/site/Security.html URL: http://svn.apache.org/viewvc/falcon/site/Security.html?rev=1660589&r1=1660588&r2=1660589&view=diff ============================================================================== --- falcon/site/Security.html (original) +++ falcon/site/Security.html Wed Feb 18 10:55:56 2015 @@ -1,13 +1,13 @@ - + Falcon - Securing Falcon @@ -245,7 +245,7 @@ -
  • Last Published: 2015-01-11
  • +
  • Last Published: 2015-02-18
  • @@ -289,19 +289,40 @@

    Super-User

    The super-user is the user with the same identity as falcon process itself. Loosely, if you started the falcon, then you are the super-user. The super-user can do anything in that permissions checks never fail for the super-user. There is no persistent notion of who was the super-user; when the falcon is started the process identity determines who is the super-user for now. The Falcon super-user does not have to be the super-user of the falcon host, nor is it necessary that all clusters have the same super-user. Also, an experimenter running Falcon on a personal workstation, conveniently becomes that installation's super-user without any configuration.

    -

    Falcon also allows users to configure a super user group and allows users belonging to this group to be a super user.

    +

    Falcon also allows users to configure a super user group and allows users belonging to this group to be a super user.

    +

    ACL owner and group must be valid even if the authenticated user is a super-user.

    Group Memberships

    Once a user has been authenticated and a username has been determined, the list of groups is determined by a group mapping service, configured by the hadoop.security.group.mapping property in Hadoop. The default implementation, org.apache.hadoop.security.ShellBasedUnixGroupsMapping, will shell out to the Unix bash -c groups command to resolve a list of groups for a user.

    -

    Note that Falcon stores the user and group of an Entity as strings; there is no conversion from user and group identity numbers as is conventional in Unix.

    +

    Note that Falcon stores the user and group of an Entity as strings; there is no conversion from user and group identity numbers as is conventional in Unix.

    +

    The only limitation is that a user cannot add a group in ACL that he does not belong to.

    Authorization Provider

    Falcon provides a plugin-able provider interface for Authorization. It also ships with a default implementation that enforces the following authorization policy.

    Entity and Instance Management Operations Policy
    -

    * All Entity and Instance operations are authorized for users who created them, Owners and users with group memberships * Reference to entities with in a feed or process is allowed with out enforcing permissions Any Feed or Process can refer to a Cluster entity not owned by the Feed or Process owner Any Process can refer to a Feed entity not owned by the Process owner

    +

    +
      +
    • All Entity and Instance operations are authorized for users who created them, Owners and users with group memberships
    • +
    • Reference to entities with in a feed or process is allowed with out enforcing permissions
    +

    Any Feed or Process can refer to a Cluster entity not owned by the Feed or Process owner. Any Process can refer to a Feed entity not owned by the Process owner

    The authorization is enforced in the following way:

    -

    if admin resource, if authenticated user name matches the admin users configuration Else if groups of the authenticated user matches the admin groups configuration Else authorization exception is thrown Else if entities or instance resource if the authenticated user matches the owner in ACL for the entity Else if the groups of the authenticated user matches the group in ACL for the entity Else authorization exception is thrown Else if lineage resource All have read-only permissions, reason being folks should be able to examine the dependency and allow reuse

    +

    +
      +
    • if admin resource, +
        +
      • If authenticated user name matches the admin users configuration
      • +
      • Else if groups of the authenticated user matches the admin groups configuration
      • +
      • Else authorization exception is thrown
    • +
    • Else if entities or instance resource +
        +
      • If the authenticated user matches the owner in ACL for the entity
      • +
      • Else if the groups of the authenticated user matches the group in ACL for the entity
      • +
      • Else authorization exception is thrown
    • +
    • Else if lineage resource +
        +
      • All have read-only permissions, reason being folks should be able to examine the dependency and allow reuse
    +

    To authenticate user for REST api calls, user should append "user.name=<username>" to the query.

    operations on Entity Resource

    @@ -414,7 +435,7 @@
    Admin User/Group
    Lineage Resource Policy
    -

    Lineage is read-only and hence all users can look at lineage for their respective entities.

    +

    Lineage is read-only and hence all users can look at lineage for their respective entities. Note: This gap will be fixed in a later release.

    Authentication Configuration

    Following is the Server Side Configuration Setup for Authentication.

    @@ -472,6 +493,9 @@ # Comma separated list of black listed users *.falcon.http.authentication.blacklisted.users= +# Increase Jetty request buffer size to accommodate the generated Kerberos token +*.falcon.jetty.request.buffer.size=16192 +

    Pseudo/Simple Configuration

    @@ -572,22 +596,32 @@ Configuration Store ${config.store.uri} falcon -750 +700 -Oozie coord/bundle XMLs -${cluster.staging-location}/workflows/{entity}/{entity-name} +Cluster Staging Location +${cluster.staging-location} falcon -644 +777 +Cluster Working Location +${cluster.working-location} +falcon +755 + Shared libs {cluster.working}/{lib,libext} falcon 755 + +Oozie coord/bundle XMLs +${cluster.staging-location}/workflows/{entity}/{entity-name} +$user +cluster umask App logs ${cluster.staging-location}/workflows/{entity}/{entity-name}/logs -falcon -777
    +$user +cluster umaskNote: Please note that the cluster staging and working locations MUST be created prior to submitting a cluster entity to Falcon. Also, note that the the parent dirs must have execute permissions.

    Backwards compatibility

    @@ -606,13 +640,11 @@

    The blacklist users used to have the following super users: hdfs, mapreduce, oozie, and falcon. The list is externalized from code into Startup.properties file and is empty now and needs to be configured specifically in the file.

    Falcon Dashboard

    -

    The dashboard assumes an anonymous user in Pseudo/Simple method and hence anonymous users must be enabled for it to work.

    -
    -
    -# Indicates if anonymous requests are allowed when using 'simple' authentication.
    -*.falcon.http.authentication.simple.anonymous.allowed=true
    -
    -
    +

    To initialize the current user for dashboard, user should append query param "user.name=<username>" to the REST api call.

    +

    If dashboard user wishes to change the current user, they should do the following.

    +
      +
    • delete the hadoop.auth cookie from browser cache.
    • +
    • append query param "user.name=<new_user>" to the next REST API call.

    In Kerberos method, the browser must support HTTP Kerberos SPNEGO.

    Known Limitations