atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shwetha G S (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ATLAS-58) Make hive hook reliable
Date Tue, 15 Sep 2015 06:51:45 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744923#comment-14744923
] 

Shwetha G S edited comment on ATLAS-58 at 9/15/15 6:51 AM:
-----------------------------------------------------------

Hive hook sends notification messages (list of entities). The Notification consumer on server
side consumes these messages and registers the entities. The server handles de-duping of entities
based on the unique attribute of the entity

Big changes:
1. Concept of service that are started and stopped at atlas start and stop
2. De-duping of entities on server based on any unique attribute for the entity. If entity
doesn't have any unique attribute, de-duping is not done and new entity is created
3. Changed entity submit API to take list of entities instead of just 1 entity (required for
hive hook)
4. Moved security tests from integration tests to unit tests - as they were creating issues
with server start as jetty already starts another server for integration tests
5. Removed some duplicate tests from repository module (the same tests exist in typesystem
module as well)
6. In webapp ITs, re-used the types defined
7. Hive hook now sends notifications instead of registering entities. Sending notification
is done synchronously. So, this adds to hive command execution delay. But this also makes
it reliable

Pending:
1. Entity updates like alter table commands are not handlded. Will create another jira for
this
2. Webapp jetty plugin doesn't shutdown embedded kafka at the end of integration tests. So,
hive bridge ITs fail. Hive bridge ITs pass if run on their own. Still checking on this


was (Author: shwethags):
Hive hook sends notification messages (list of entities). The Notification consumer on server
side consumes these messages and registers the entities. The server handles de-duping of entities
based on the unique attribute of the entity

Big changes:
1. Concept of service that are started started and stopped at atlas start and stop
2. De-duping of entities on server based on any unique attribute for the entity. If entity
doesn't have any unique attribute, de-duping is not done and new entity is created
3. Changed entity submit API to take list of entities instead of just 1 entity (required for
hive hook)
4. Moved security tests from integration tests to unit tests - as they were creating issues
with server start as jetty already starts another server for integration tests
5. Removed some duplicate tests from repository module (the same tests exist in typesystem
module as well)
6. In webapp ITs, re-used the types defined
7. Hive hook now sends notifications instead of registering entities. Sending notification
is done synchronously. So, this adds to hive command execution delay. But this also makes
it reliable

Pending:
1. Entity updates like alter table commands are not handlded. Will create another jira for
this
2. Webapp jetty plugin doesn't shutdown embedded kafka at the end of integration tests. So,
hive bridge ITs fail. Hive bridge ITs pass if run on their own. Still checking on this

> Make hive hook reliable
> -----------------------
>
>                 Key: ATLAS-58
>                 URL: https://issues.apache.org/jira/browse/ATLAS-58
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Shwetha G S
>            Assignee: Shwetha G S
>              Labels: incompatible
>             Fix For: trunk
>
>         Attachments: ATLAS-58-v2.patch, ATLAS-58.patch
>
>
> Currently, hive hook executes in background thread pool and is an best effort approach
to register entities. But this needs to be reliable for data governance to be effective
> One way is - in hive hook, add the entities to some messaging framework and atlas server
can read the entities from the message and register in atlas. Since, posting message is faster,
we can do it synchronously and hence reliable entity registration.
> We can start with kafka for messaging, but any other messaging framework should be pluggable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message