atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@gmail.com>
Subject Re: Review Request 48939: ATLAS-904 Handle process qualified name per Hive Operation
Date Mon, 20 Jun 2016 09:11:38 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48939/#review138562
-----------------------------------------------------------




addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java (line
50)
<https://reviews.apache.org/r/48939/#comment203755>

    This seems unused actually.


- Hemanth Yamijala


On June 20, 2016, 4 a.m., Suma Shivaprasad wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48939/
> -----------------------------------------------------------
> 
> (Updated June 20, 2016, 4 a.m.)
> 
> 
> Review request for atlas, Shwetha GS and Hemanth Yamijala.
> 
> 
> Bugs: ATLAS-904
>     https://issues.apache.org/jira/browse/ATLAS-904
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> 1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
> 2. HiveOperation.name doesnt provide identifiers for identiifying INSERT, INSERT_OVERWRITE,
UPDATE, DELETE etc separately . Hence adding WriteEntity.WriteType as well which exhibits
the following behaviour
> a. If there are multiple outputs, for each output, adds the query type(WriteType)
> b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION], WriteType
is INSERT/INSERT_OVERWRITE
> b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
> c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - linage is not
available for this since this is single table operation]
> 3.When input is of type local dir or hdfs path currently, it doesnt add it to qualified
name. The reason is that partition based paths cause a lot of processes to be created in this
case instead of updating the same process.
> Pending:
> Address Shwetha G S suggestion to add hdfs paths to process qualified name only in case
of non-partition based queries. This needs to be done per HiveOperation type
> 1. if HiveOperation = LOAD, IMPORT, EXPORT - detect if the current query context is dealing
with partitions and do not add if it is partition based.
> 2. If HiveOperation = INSERT OVERWRITE DFS_PATH/LOCAL_PATH , then detect if the query
context is dealing with a partitioned table in inputs and decide if we need to add or not.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
c956a32 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 23c82df 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e7fbf71

>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 0713d30 
> 
> Diff: https://reviews.apache.org/r/48939/diff/
> 
> 
> Testing
> -------
> 
> Existing tests modified to query with new qualified name. Need to add tests for INSERT
INTO TABLE
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message