falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balu Vellanki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1728) Process entity definition allows multiple clusters when it has output Feed defined.
Date Thu, 07 Jan 2016 22:27:40 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088260#comment-15088260

Balu Vellanki commented on FALCON-1728:

[~pallavi.rao] : You are correct. Running process on a cluster not defined in feed is not
allowed and that is good. 

[~ajayyadava] : I now understand why you want to run process on multiple clusters. But there
is a scenario where falcon should not allow. When process is run on ClusterOne and ClusterTwo,
output Feed has ClusterOne as source cluster and ClusterTwo as target Cluster,  falcon should
throw a validation exception.   Falcon should only allow process to run on output Feed's source
clusters.  Falcon can allow the process InputFeeds to come from both source/target.   

If you agree, I will change the title/description of this Jira. Please comment.

> Process entity definition allows multiple clusters when it has output Feed defined. 
> ------------------------------------------------------------------------------------
>                 Key: FALCON-1728
>                 URL: https://issues.apache.org/jira/browse/FALCON-1728
>             Project: Falcon
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 0.9
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>            Priority: Critical
> Process XSD allows user to specify multiple clusters per process entity. I am guessing
this would allow a user to run duplicate instance of the process on multiple clusters at the
same time (I do not really see a need for this). When the process has an output feed defined,
you can have duplicate process instances writing to same feed instance, causing data corruption/failures.
The solution is to 
> 1. Do not allow multiple clusters per process. Let the user define a duplicate process
if user wants to run duplicate instances.  
> OR
> 2. Allow multiple clusters, but only when there is no output feed defined.
> [~sriksun] please let me know if there is any other reason for allowing multiple clusters
in a process. 

This message was sent by Atlassian JIRA

View raw message