falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth Sundarrajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-141) Support cluster updates
Date Fri, 27 Nov 2015 11:06:11 GMT

    [ https://issues.apache.org/jira/browse/FALCON-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029768#comment-15029768
] 

Srikanth Sundarrajan commented on FALCON-141:
---------------------------------------------

I, [~sandeep.samudrala], [~pragya.mittal] discussed a bit offline on how the cluster updates
can possibly work. Wanted to put the thoughts down here for broader discussion & consideration.

+Cluster Updates are necessary to handle following scenarios:+
* Update from Non secure hadoop cluster to secure or vice-versa
* Update from Non HA to HA or vice-versa
* Update of end points of any of the interfaces
* Update of properties in the cluster entity

In general cluster updates should perform the following actions
* Validate the new entity definition
* Perform touch of all feed & process entities to complete the operation (after deduping
entities)

We were considering using OOZIE-2187 to centralize the end points to simplify the updates,
but there are a few short comings with the approach as pointed out by [~venkatnrangan] during
the bi-weekly sync up call. 
* Cross cluster Hive replication may have multiple NN/JT end points referred to in a workflow
and we can't piggy back on the global conf
* There may be other interfaces defined in the cluster entity, which may not be supported
in oozie's global section
* This may not work directly without performing a touch on every entity in the system after
the feature is enabled

+The new proposal is as follows+
* Enable a feature through admin option to put falcon is special mode: "safe-mode" or "-initialize-update"

* Disallow all operations except for some read-only operations over and above FALCON-1623
* Accept cluster update operation and add the updated cluster definition in staging directory
without actually performing the update
* Use admin option to leave "safe-mode" or "finalize-update" to perform the cluster update
(validation of entity followed by dependent entity updates). System will successfully leave
safe-mode if it is able to perform update, else will remain in safe-mode. 
* If cluster update is successful, but dependent entity update were to fail, touch operation
on entity can be performed to move forward.
??Falcon server on restart will put it self automatically in safe-mode if it finds any entity
in the staging directory??

*Some scenarios and how they play out with the new proposal*
+Move to Safe Mode, No updates+
* Issue admin option to move to safe mode, don't perform cluster entity update operation
* Leave safe mode - Goes to normal (NOOP)

+Move to Safe Mode, No updates, Restart Falcon+
* Issue admin option to move to safe mode, dont perform cluster entity update operation
* Restart Falcon
* On Restart checks staged cluster entity updates, finds none and restarts normally

+Move to Safe Mode,  Update one or more entities, Leave safe mode+
* Issue admin option to move to safe mode, 
* Perform cluster entity update operation
* Issue leave safe mode admin op
* Checks for existence of staged cluster updates
* Validates cluster entity
* Performs cluster update
* Perform update on dependent entities
* Leave safe mode

+Move to Safe Mode,  Update one or more entities, Restart, Leave safe mode+
* Issue admin option to move to safe mode, 
* Perform cluster entity update operation
* Restart falcon server
* Falcon finds staged updates, moves to Safe mode automatically
* Issue leave safe mode admin op
* Checks for existence of staged cluster updates
* Validates cluster entity
* Performs cluster update
* Perform update on dependent entities
* Leave safe mode

> Support cluster updates
> -----------------------
>
>                 Key: FALCON-141
>                 URL: https://issues.apache.org/jira/browse/FALCON-141
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Shwetha G S
>            Assignee: Ajay Yadava
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message