falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkatesh Seetharam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-288) Persist lineage information into a persistent store
Date Tue, 11 Feb 2014 00:48:21 GMT

    [ https://issues.apache.org/jira/browse/FALCON-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897363#comment-13897363

Venkatesh Seetharam commented on FALCON-288:

The entity dependency graph looks good except that we can add 2 more things:
* colo as the vertex with an edge from cluster to colo as "collocated"
* workflow as a vertex with an edge from process as "executes". I'd like to capture versioning
on workflows and makes sense to capture that in an instance. 

Also, the entity vertices have 2 keys that are indexed, 
* *name* = entity-name which should be unique
* *type* = entity type

I have a few questions on instance lineage graph
* Redundant edge to the cluster? Its already there for a feed, no? Feed might get updated
with a new cluster?
* How to version workflow instances?
* How do we treat reinstatements of instances? Delete old and retain the latest?

Instance keys:
* name = instance-id (timed partition)
* type = entity-type
* creation-time = timestamp
* workflow will have workflowId, subflowId, engine-url, and engine


> Persist lineage information into a persistent store
> ---------------------------------------------------
>                 Key: FALCON-288
>                 URL: https://issues.apache.org/jira/browse/FALCON-288
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.5
>            Reporter: Venkatesh Seetharam
>              Labels: lineage
>         Attachments: Dependency Graph.png, Lineage Over Dependency.png
> Need to evaluate the store - rdbms vs graph db. Leaning towards latter since the data
is hierarchical.

This message was sent by Atlassian JIRA

View raw message