hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16037) with fetch optimization, the query runs after locks are released
Date Fri, 24 Feb 2017 22:52:44 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-16037:
------------------------------------
    Description: 
Other assumptions may also be broken.
FetchTask.execute implementation is very curious - it does nothing, and the FetchTask that
actually runs the query is put in the same place as the one that normally fetches the results;
that is to say, the whole pipeline is run after Driver has "shut down" the query. That releases
logs before the query runs, and may also have other implications (for transactions, etc.?
I don't think simple fetch can be run for updates)
Adding a log line to TSOP process method, and running encrypted_table_insert from EncryptedHDFS
cli driver, I get:
{noformat}
2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
<PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
[no lines here]
2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=runTasks start=1487976084521 end=1487976084521 duration=0 from=org.apache.hadoop.hive.ql.Driver>
...
2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
<PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ZooKeeperHiveLockManager:
About to release lock for default/encrypted_table
2017-02-24T14:41:24,523 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ZooKeeperHiveLockManager:
About to release lock for default
2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=releaseLocks start=1487976084521 end=1487976084525 duration=4 from=org.apache.hadoop.hive.ql.Driver>
2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=Driver.run start=1487976084394 end=1487976084525 duration=131 from=org.apache.hadoop.hive.ql.Driver>
2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ql.Driver: Shutting
down query 
...
2017-02-24T14:41:24,532 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] mapred.FileInputFormat:
Total # of splits generated by getSplits: 1, TimeTaken: 4
2017-02-24T14:41:24,532 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.FetchOperator:
Creating fetchTask ...
...
2017-02-24T14:41:24,541  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
TODO# calling process
2017-02-24T14:41:24,543  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
TODO# calling process
2017-02-24T14:41:24,543  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
Closing operator TS[0]
...
{noformat}


  was:
Other assumptions may also be broken.
FetchTask.execute implementation is very curious - it does nothing, and the FetchTask that
actually runs the query is put in the same place as the one that normally fetches the results;
that is to say, the whole pipeline is run after Driver has "shut down" the query. That releases
logs before the query runs, and may also have other implications.
Adding a log line to TSOP process method, and running encrypted_table_insert from EncryptedHDFS
cli driver, I get:
{noformat}
2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
<PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
[no lines here]
2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=runTasks start=1487976084521 end=1487976084521 duration=0 from=org.apache.hadoop.hive.ql.Driver>
...
2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
<PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ZooKeeperHiveLockManager:
About to release lock for default/encrypted_table
2017-02-24T14:41:24,523 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ZooKeeperHiveLockManager:
About to release lock for default
2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=releaseLocks start=1487976084521 end=1487976084525 duration=4 from=org.apache.hadoop.hive.ql.Driver>
2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=Driver.run start=1487976084394 end=1487976084525 duration=131 from=org.apache.hadoop.hive.ql.Driver>
2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ql.Driver: Shutting
down query 
...
2017-02-24T14:41:24,532 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] mapred.FileInputFormat:
Total # of splits generated by getSplits: 1, TimeTaken: 4
2017-02-24T14:41:24,532 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.FetchOperator:
Creating fetchTask ...
...
2017-02-24T14:41:24,541  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
TODO# calling process
2017-02-24T14:41:24,543  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
TODO# calling process
2017-02-24T14:41:24,543  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
Closing operator TS[0]
...
{noformat}



> with fetch optimization, the query runs after locks are released 
> -----------------------------------------------------------------
>
>                 Key: HIVE-16037
>                 URL: https://issues.apache.org/jira/browse/HIVE-16037
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> Other assumptions may also be broken.
> FetchTask.execute implementation is very curious - it does nothing, and the FetchTask
that actually runs the query is put in the same place as the one that normally fetches the
results; that is to say, the whole pipeline is run after Driver has "shut down" the query.
That releases logs before the query runs, and may also have other implications (for transactions,
etc.? I don't think simple fetch can be run for updates)
> Adding a log line to TSOP process method, and running encrypted_table_insert from EncryptedHDFS
cli driver, I get:
> {noformat}
> 2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
<PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
> [no lines here]
> 2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=runTasks start=1487976084521 end=1487976084521 duration=0 from=org.apache.hadoop.hive.ql.Driver>
> ...
> 2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
<PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
> 2017-02-24T14:41:24,521 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ZooKeeperHiveLockManager:
About to release lock for default/encrypted_table
> 2017-02-24T14:41:24,523 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ZooKeeperHiveLockManager:
About to release lock for default
> 2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=releaseLocks start=1487976084521 end=1487976084525 duration=4 from=org.apache.hadoop.hive.ql.Driver>
> 2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] log.PerfLogger:
</PERFLOG method=Driver.run start=1487976084394 end=1487976084525 duration=131 from=org.apache.hadoop.hive.ql.Driver>
> 2017-02-24T14:41:24,525 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] ql.Driver:
Shutting down query 
> ...
> 2017-02-24T14:41:24,532 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] mapred.FileInputFormat:
Total # of splits generated by getSplits: 1, TimeTaken: 4
> 2017-02-24T14:41:24,532 DEBUG [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.FetchOperator:
Creating fetchTask ...
> ...
> 2017-02-24T14:41:24,541  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
TODO# calling process
> 2017-02-24T14:41:24,543  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
TODO# calling process
> 2017-02-24T14:41:24,543  INFO [50cde691-3602-4273-a4d9-e35f0c8b6001 main] exec.TableScanOperator:
Closing operator TS[0]
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message