atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suma Shivaprasad <sumasai.shivapra...@gmail.com>
Subject Re: Review Request 48939: ATLAS-904 Handle process qualified name per Hive Operation
Date Mon, 20 Jun 2016 17:12:40 GMT


> On June 20, 2016, 9:26 a.m., Hemanth Yamijala wrote:
> > addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java, line
763
> > <https://reviews.apache.org/r/48939/diff/2/?file=1423788#file1423788line763>
> >
> >     Do we need a separator between the input set and output set?

This is already being taken care of within the if checks and is added before and output dataset
entry is added to the buffer.


- Suma


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48939/#review138565
-----------------------------------------------------------


On June 20, 2016, 4 a.m., Suma Shivaprasad wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48939/
> -----------------------------------------------------------
> 
> (Updated June 20, 2016, 4 a.m.)
> 
> 
> Review request for atlas, Shwetha GS and Hemanth Yamijala.
> 
> 
> Bugs: ATLAS-904
>     https://issues.apache.org/jira/browse/ATLAS-904
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> 1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
> 2. HiveOperation.name doesnt provide identifiers for identiifying INSERT, INSERT_OVERWRITE,
UPDATE, DELETE etc separately . Hence adding WriteEntity.WriteType as well which exhibits
the following behaviour
> a. If there are multiple outputs, for each output, adds the query type(WriteType)
> b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION], WriteType
is INSERT/INSERT_OVERWRITE
> b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
> c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - linage is not
available for this since this is single table operation]
> 3.When input is of type local dir or hdfs path currently, it doesnt add it to qualified
name. The reason is that partition based paths cause a lot of processes to be created in this
case instead of updating the same process.
> Pending:
> Address Shwetha G S suggestion to add hdfs paths to process qualified name only in case
of non-partition based queries. This needs to be done per HiveOperation type
> 1. if HiveOperation = LOAD, IMPORT, EXPORT - detect if the current query context is dealing
with partitions and do not add if it is partition based.
> 2. If HiveOperation = INSERT OVERWRITE DFS_PATH/LOCAL_PATH , then detect if the query
context is dealing with a partitioned table in inputs and decide if we need to add or not.
> 
> 
> Diffs
> -----
> 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
c956a32 
>   addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 23c82df 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java e7fbf71

>   webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 0713d30 
> 
> Diff: https://reviews.apache.org/r/48939/diff/
> 
> 
> Testing
> -------
> 
> Existing tests modified to query with new qualified name. Need to add tests for INSERT
INTO TABLE
> 
> 
> Thanks,
> 
> Suma Shivaprasad
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message