falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Yadava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-1728) Process entity definition allows multiple clusters when it has output Feed defined.
Date Thu, 07 Jan 2016 05:02:39 GMT

    [ https://issues.apache.org/jira/browse/FALCON-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086807#comment-15086807
] 

Ajay Yadava commented on FALCON-1728:
-------------------------------------

[~bvellanki] It is most definitely not a bug and as I said earlier lot of users are heavily
dependent on this feature, so removing it is not an option. If you can tell us what is it
that you want to achieve and this feature is not allowing you to do then probably we can suggest
alternatives. If the use case is to just disallow certain users from doing certain hypothetical
bad configurations even then removing this feature is definitely not the solution and we should
consider other alternatives. 

Even without this feature there are many ways in which two processes can end up overwriting
the same location, so it is up to the users to configure their entities properly. 

{quote}
Now you have a process ProcessOne whose output feed is FeedOne. The process is run on clusters
ClusterTwo and ClusterThree. When oozie runs the process instance, the user expects the output
data to be generated in
ClusterOne/apps/falcon/feedOne/location/{YEAR}-{MONTH}-{DAY}-{HOUR}
{quote}
When process runs only on clusterTwo and clusterThree why is the user expecting the data to
be generated in ClusterOne?

If two job instances of process in different cluster are writing to same output directory
then it means your feed has same location in different clusters, why do you want that? 

Are you trying to suggest that the replication will overwrite? In that case did you configure
the partitions properly?

> Process entity definition allows multiple clusters when it has output Feed defined. 
> ------------------------------------------------------------------------------------
>
>                 Key: FALCON-1728
>                 URL: https://issues.apache.org/jira/browse/FALCON-1728
>             Project: Falcon
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 0.9
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>            Priority: Critical
>
> Process XSD allows user to specify multiple clusters per process entity. I am guessing
this would allow a user to run duplicate instance of the process on multiple clusters at the
same time (I do not really see a need for this). When the process has an output feed defined,
you can have duplicate process instances writing to same feed instance, causing data corruption/failures.
The solution is to 
> 1. Do not allow multiple clusters per process. Let the user define a duplicate process
if user wants to run duplicate instances.  
> OR
> 2. Allow multiple clusters, but only when there is no output feed defined.
> [~sriksun] please let me know if there is any other reason for allowing multiple clusters
in a process. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message