hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Sherman via Review Board <nore...@reviews.apache.org>
Subject Re: Review Request 64193: HIVE-18054: Make Lineage work with concurrent queries on a Session
Date Wed, 06 Dec 2017 23:06:16 GMT


> On Dec. 2, 2017, 12:22 a.m., Sahil Takiar wrote:
> > Since we touch the `LoadSemanticAnalyzer` could we add a q-test (could be added
to one of the existing `lineage*.q` files) for `LOAD` statements. Same for import / export
statements (as far as I can tell there are no existing ones, correct me if I am wrong).
> > 
> > If you have time, it would be great to run some of the lineage tests for HoS too,
but since thats a bit orthogonal to this JIRA, it can be done in a follow up JIRA.
> 
> Andrew Sherman wrote:
>     I will addsome more tests...

Update: I looked into his some more. It turns out that Lineage info is printed by default
by the PostExecutePrinter.
So any .q tests should show their lineage in the output.
HoS tests do print some lineage so I think that part is covered.
But currently no lineage is printed for LOAD/IMPORT/EXPORT.
The MoveTask does call getLineageState().setLineage() to set lineage into the LineageState
but it does not call getLineageState().mapDirToOp().
Possibly this is because LOAD statements don't have a org.apache.hadoop.hive.ql.exec.Operator
And IMPORT statements behave similarly.
IMO the whole Lineage stuff is not clearly specified. 
Do you think it is worth doing follow-up work to try to document /test) it better?


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64193/#review192601
-----------------------------------------------------------


On Nov. 30, 2017, 1:22 a.m., Andrew Sherman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64193/
> -----------------------------------------------------------
> 
> (Updated Nov. 30, 2017, 1:22 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> A Hive Session can contain multiple concurrent sql Operations.
> Lineage is currently tracked in SessionState and is cleared when a query
> completes. This results in Lineage for other running queries being lost.
> 
> To fix this, move LineageState from SessionState to QueryState.
> In MoveTask/MoveWork use the LineageState from the MoveTask's QueryState
> rather than trying to use it from MoveWork.
> Add a test which runs multiple jdbc queries in a thread pool
> against the same connection and show that Vertices are not lost from Lineage.
> As part of this test, add ReadableHook, an ExecuteWithHookContext that stores
> HookContexts in memory and makes them available for reading.
> Make LineageLogger methods static so they can be used elsewhere.
> 
> Sometimes a running query (originating in a Driver) will instantiate
> another Driver to run or compile another query. Because these Drivers
> shared a Session, the child Driver would accumulate Lineage information
> along with that of the parent Driver. For consistency a LineageState is
> passed to these child Drivers and stored in the new Driver's QueryState.
> 
> 
> Diffs
> -----
> 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniHS2.java f5ed735c1ec14dfee338e56020fa2629b168389d

>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java af9f193dc94e2e05caa88d965a34f4483c9d7069

>   ql/src/java/org/apache/hadoop/hive/ql/QueryState.java 7d5aa8b179e536e25c41a8946e667f8dd5669e0f

>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java e7af5e004fb560b574b82f6d1b60517511802f37

>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e2f8c1f8012ad25114e279747e821b291c7f4ca6

>   ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1f0487f4f72ab18bcf876f45ad5758d83a7f001b

>   ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
262225fc202d4627652acfd77350e44b0284b3da 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java
bb1f4e50509e57a9d0b9e6793c1fc08baa4d2981 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java 7b617309f6b0d8a7ce0dea80ab1f790c2651b147

>   ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageLogger.java 2f764f8a29a9d41a7db013a949ffe3a8a9417d32

>   ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadableHook.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java 68709b4d3baf15d78e60e948ccdef3df84f28cec

>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 1e577da82343a1b7361467fb662661f9c6642ec0

>   ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 29886ae7f97f8dae7116f4fc9a2417ab8f9dac0a

>   ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 7b067a0d45e33bc3347c43b050af933c296a9227

>   ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 504b0623142a6fa6cdb45a26b49f146e12ec2d7a

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java d7a83f775abca39b219f71aff88173a14ffaee9f

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java 4387c4297fee48d4c03e95d5a2fcb822ab480eeb

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 67739a1db9fc52a67f4f5ea7dba80fe0e95750c8

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 338c1856672f09bb7da35d2336ebb5b6f3fdc5a6

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java e6c07713b24df719315d804f006151106eea9aed

>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1fd634c928a5384b09d97322c3ea785f518d73fe

>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 065c7e50986872cd35386feee712f3452597d643

>   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java 0c160acf46eb1eb07c5f04091099c1024e166638

>   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java b6f1139fe1a78283277bf4d0c5224ab1d718c634

>   ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java cd75130d7c5f0b402f1b4331c57edc611eb4b2ed

>   ql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java f31775ed942160da73344c4dca707da7b8c658a6

>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 238fbd60572ee5f7f8f6c4d5b2abce8f66c7e495

>   ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java d7a56e5846d5754dec5070d8c44443543a3695e4

>   ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java 498b6741c3f40b92ce3fb218e91e7809a17383f0

>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b66817f65f65b6aaf8dbc339a969b8e9e0565e9e

>   ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 7b2937032ab8dd57f8923e0a9e7aab4a92de55ee

>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java be33f380030ea8b416a4549c3947d767bba66356

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 4d2bcfa285dc08811106f3c234346efff22afd99

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 604c8aee151a45cf942852a3644b5e79f779f353

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 965044d9253585eeaeef50d7fe4fc4d818042df8

>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 28a33740b30b7be0057ce91de55a0407dd2f2cbf

>   ql/src/java/org/apache/hadoop/hive/ql/session/LineageState.java 056d6141d6239816699ed5f730cbd14e48d8d9bb

>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java bb6ddc6fa4667ac0e30994d0f9ee8b969542383c

>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java
340689255c738ea497bcd269463b8b8bc38cf34c 
>   ql/src/test/org/apache/hadoop/hive/ql/parse/TestGenTezWork.java 2c28c398ca49ba661df460c9f3e6d578c785d3ce

> 
> 
> Diff: https://reviews.apache.org/r/64193/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message