falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkatesh Seetharam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-288) Persist lineage information into a persistent store
Date Sun, 02 Mar 2014 06:17:19 GMT

    [ https://issues.apache.org/jira/browse/FALCON-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917297#comment-13917297
] 

Venkatesh Seetharam commented on FALCON-288:
--------------------------------------------

Thanks [~sriksun] for reviewing the patch over the weekend. Sincerely appreciate it.

bq. Why do we need user node attached to the cluster vertex that relation isn't very useful
and is likely to be misleading as well.
The intent was to capture the user who created this cluster. Not an owner per se.

bq. addVertex() checks for existence of the vertex, however similar thing is not done for
edge. 
Good catch. Will add this.

bq. it might be useful to not assume the default edge label to be "output", but to actually
check for it and throw an assertion error otherwise.
Ok, makes sense.

bq. This is going to be a little tricky. If you leave behind vertices, even after all incident
edges are removed, database is going to monotonically increase in size and cause performance
issue along the line.
This will never be the case in this model. Also, I don't think we will ever have thousands
of entities to work with, no?

bq. Is the motivation of adding classification & groups relationship for every instance
to provide "WHAT-WAS" view of the feed instance?
Yes but not very religious about it if it does indeed affect performance. These are only edges
and no new vertices.

bq. Why is workflowInstance a separate node in the graph and not a set of property on the
process instance? 
I thought this can capture changes to workflows and add more properties down the line as you
describe with reruns.

bq. I can imagine this being useful in re-run scenarios, but I dont see that run-relationship
being captured though.
The run id is captured as a property. This is an initial implementation and needs to be worked
upon to enhance it.

bq. It is reasonable to leave behind graph elements after an entity is deleted to allow historical
queries. However there has to be some cleanup based on time limit that ought to be available.

This also is what we discussed to leave behind elements. The clean up can be time based in
the background which can come in a separate jira.

> Persist lineage information into a persistent store
> ---------------------------------------------------
>
>                 Key: FALCON-288
>                 URL: https://issues.apache.org/jira/browse/FALCON-288
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.5
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>              Labels: lineage
>         Attachments: Dependency Graph.png, FALCON-288-Hive-Review.patch, FALCON-288-review-v1.patch,
FALCON-288-review.patch, FALCON-288-v1.patch, Lineage Over Dependency.png
>
>
> Need to evaluate the store - rdbms vs graph db. Leaning towards latter since the data
is hierarchical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message