atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ATLAS-904) Hive hook fails due to session state not being set
Date Thu, 16 Jun 2016 05:26:05 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333127#comment-15333127
] 

Hemanth Yamijala commented on ATLAS-904:
----------------------------------------

[~suma.shivaprasad], I don't know the full details of the code - so please treat my review
comments with a pinch of salt.

There are 2 changes that this patch covers:
* ATLAS-877 - Here we switched from getting a DDL time to the create time. This change seems
right to me. +1.
* ATLAS-904 - What we have essentially done is that instead of fixing the bug reported here,
we have removed the cause of the bug - i.e the normalize method itself.

So, let's focus on the removal of the normalization. From what I understand, this was done
because Atlas currently does not model partition level lineage. Hence, by removing literals
in queries involving 2 sets of DataSets (inputs & outputs), we were 'normalizing' partition
changes to become like table level changes. Further, we were capturing the most recent query
that ran on this set. (It appears that this was an array of latest queries, but I don't know
if we were appending to the array, or would it be a replace - in which case we would capture
only the latest query).

I think until we support partition level lineage, sticking to the above model is useful. If
normalization is costly, as it seems the Hive SMEs are telling us, then can we just make the
process name very generic capturing {set of inputs} -> {set of outputs} in sorted order
of input and output names? We could still store the actual query (unnormalized) into array
of latest queries (I would prefer this is a bounded array - say the last 100 or configurable
number of entries??).

I believe this is a more usable solution than showing all the original unnormalized queries
- which could be very large for all that we know. Please let me know if this makes sense.


> Hive hook fails due to session state not being set
> --------------------------------------------------
>
>                 Key: ATLAS-904
>                 URL: https://issues.apache.org/jira/browse/ATLAS-904
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 0.7-incubating
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>            Priority: Blocker
>             Fix For: 0.7-incubating
>
>         Attachments: ATLAS-904.1.patch, ATLAS-904.patch
>
>
> {noformat}
> 2016-06-15 11:34:30,423 WARN  [Atlas Logger 0]: hook.HiveHook (HiveHook.java:normalize(557))
- Could not rewrite query due to error. Proceeding with original query EXPORT TABLE test_export_table
to 'hdfs://localhost:9000/hive_tables/test_path1'
> java.lang.NullPointerException: Conf non-local session path expected to be non-null
> 	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
> 	at org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:641)
> 	at org.apache.hadoop.hive.ql.Context.<init>(Context.java:133)
> 	at org.apache.hadoop.hive.ql.Context.<init>(Context.java:120)
> 	at org.apache.atlas.hive.rewrite.HiveASTRewriter.<init>(HiveASTRewriter.java:44)
> 	at org.apache.atlas.hive.hook.HiveHook.normalize(HiveHook.java:554)
> 	at org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:702)
> 	at org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
> 	at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
> 	at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
> 	at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 11:34:30,423 ERROR [Atlas Logger 0]: hook.HiveHook (HiveHook.java:run(184))
- Atlas hook failed due to error
> java.lang.NullPointerException
> 	at java.lang.StringBuilder.<init>(StringBuilder.java:109)
> 	at org.apache.atlas.hive.hook.HiveHook.getProcessQualifiedName(HiveHook.java:738)
> 	at org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:703)
> 	at org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
> 	at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
> 	at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
> 	at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message