hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anishek (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-15473) Progress Bar on Beeline client
Date Fri, 27 Jan 2017 07:17:25 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842353#comment-15842353
] 

anishek edited comment on HIVE-15473 at 1/27/17 7:16 AM:
---------------------------------------------------------

There are few observations / limitations that [~thejas] had cited while reviewing this. Writing
down the reasoning here and steps of how we can move forward.

Given that we use SynchronizedHandler for the client on beeline side, only one operation /
api at a time can be in execution from a single beeline session to hiveserver2. Current flow
of how the progress bar is updated on the client side is 

Thread 1 -- does statement execution: This is achieved by calling GetOperationStatus for the
operation from beeline till the execution of the operation is complete. The server side implementation
of GetOperationStatus uses a timeout mechanism (which waits for the query execution to finish),
before it sends the status to the client. The time value is decided by a step function, where
for long running queries this can lead to a approx wait time of 5 seconds per call to GetOperationStatus
.
Thread 2 -- prints query Logs and progress logs.

*Problem Space:*
# Since the client synchronizes the various api calls, This effectively means that only one
api from either Thread 1 / Thread 2 is executed at at time and the notion of trying to project
concurrent execution capability in code for beeline seems misleading and hence with the current
patch the progress bar /  query log updates can be delayed by at least 5+ seconds ( _I dont
think we can avoid this anyways, as i will discuss later_ ). 
# Additionally, since there is no *order* of threads requesting synchronization on a object
is maintained, there is a possibility that Thread 1 can get the next lock on the object without
Thread 2 getting a chance to obtain the lock, thus leading to long delays in updating the
Query Log or Progress log ( _I am not sure how this will happen for use case of long running
queries as while Thread 1 is executing , Thread 2 would already have blocked on the synchronize
of the object. Once Thread 1 completes and before it comes around the while loop in_   
{code}
HiveStatement.waitForOperationToComplete()
{code}
_Thread 2 should start executing, it seems highly improbable that, thread 1 completes and
executes additional statements and gets the lock again before Thread 2 gets a chance to acquire
the lock_ )

So in summary:
* Prevent multi threaded code in beeline for interactions with hiveserver2, as no concurrency
is supported by the Thrift protocol, unless we move to ThriftHttpCliService using Http based
connection, or use NonBlockingThrift server for binary protocol on the server side.
* Address the issue of responsiveness if we can.

*Solution Space:*
Since concurrent execution is not supported programming anything, to that effect should be
avoided in beeline client. Hence, we strive to remove the multi threaded code from beeline
side, in effect, moving the query log and progress bar log to merge with the GetOperationStatus
api. This would still not address the issue of responsiveness as indicated in 1. above as
the GetOperationStatus will use the wait time before responding to calls from beeline side,
unless we decide to remove this, or reduce the wait time to a default value of say 500 milliseconds,
not sure why the step function is used -- _to prevent server from wasting CPU resources on
non-critical operations ?_ . This will address 2. above though since we are going to get all
the information in a single call. 

*Implementation Considerations:*
# Merge QueryLog and ProgressBarLog request / response as part of GetOperationStatus.
# To get this working we have to extend HiveStatement to include few non JDBC compliant setters
( one interface for displaying progress bar, other for displaying query logs) -- default implementations
for these will be _do nothing_ implementations
# Have setters on hive statement for both the interfaces, used by beeline to provide required
implementations.
# As part of hive statement execute(*) call, we create appropriate request if custom implementations
of the interfaces are provided above. 
# There will be additional function signature for GetOperationStatus that we might need to
create to allow for backward compatibility reasons.
# _Not related to above_ : make sure we pass the vertex progress as string (for progress bar
display) and query progress as custom enum for decision making(and implementations on server
side to map from execution engine based state to our generic enum state).
 
If we are too worried about the responsiveness of the progress bar, or *2. in Problem Space*
being a major impediment for hive usage, we should go with the new implementation proposal,
else we just additionally implement *6. in Implementation Considerations*




was (Author: anishek):
There are few observations / limitations that [~thejas] had cited while reviewing this. Writing
down the reasoning here and steps of how we can move forward.

Given that we use SynchronizedHandler for the client on beeline side, only one operation /
api at a time can be in execution from a single beeline session to hiveserver2. Current flow
of how the progress bar is updated on the client side is 

Thread 1 -- does statement execution: This is achieved by calling GetOperationStatus for the
operation from beeline till the execution of the operation is complete. The server side implementation
of GetOperationStatus uses a timeout mechanism (which waits for the query execution to finish),
before it sends the status to the client. The time value is decided by a step function, where
for long running queries this can lead to a approx wait time of 5 seconds per call to GetOperationStatus
.
Thread 2 -- prints query Logs and progress logs.

*Problem Space:*
# Since the client synchronizes the various api calls, This effectively means that only one
api from either Thread 1 / Thread 2 is executed at at time and the notion of trying to project
concurrent execution capability in code for beeline seems misleading and hence with the current
patch the progress bar /  query log updates can be delayed by at least 5+ seconds ( _I dont
think we can avoid this anyways, as i will discuss later_ ). 
# Additionally, since there is no *order* of threads requesting synchronization on a object
is maintained, there is a possibility that Thread 1 can get the next lock on the object without
Thread 2 getting a chance to obtain the lock, thus leading to long delays in updating the
Query Log or Progress log ( _I am not sure how this will happen for use case of long running
queries as while Thread 1 is executing , Thread 2 would already have blocked on the synchronize
of the object. Once Thread 1 completes and before it comes around the while loop in_   
{code}
HiveStatement.waitForOperationToComplete()
{code}
_Thread 2 should start executing, it seems highly improbable that, thread 1 completes and
executes additional statements and gets the lock again before Thread 2 gets a chance to acquire
the lock_ )

So in summary:
* Prevent multi threaded code in beeline for interactions with hiveserver2, as no concurrency
is supported by the Thrift protocol, unless we move to ThriftHttpCliService using Http based
connection, or use NonBlockingThrift server for binary protocol on the server side.
* Address the issue of responsiveness if we can.

*Solution Space:*
Since concurrent execution is not supported programming anything, to that effect should be
avoided in beeline client. Hence, we strive to remove the multi threaded code from beeline
side, in effect, moving the query log and progress bar log to merge with the GetOperationStatus
api. This would still not address the issue of responsiveness as indicated in 1. above as
the GetOperationStatus will use the wait time before responding to calls from beeline side,
unless we decide to remove this, or reduce the wait time to a default value of say 500 milliseconds,
not sure why the step function is used -- _to prevent server from wasting CPU resources on
non-critical operations ?_ . This will address 2. above though since we are going to get all
the information in a single call. 

*Implementation Considerations:*
# Merge QueryLog and ProgressBarLog request / response as part of GetOperationStatus.
# To get this working we have to extend HiveStatement to include few non JDBC compliant setters
( one interface for displaying progress bar, other for displaying query logs) -- default implementations
for these will be _do nothing_ implementations
# Have setters on hive statement for both the interfaces, used by beeline to provide required
implementations.
# As part of hive statement execute(*) call, we create appropriate request if custom implementations
of the interfaces are provided above. 
# There will be additional function signature for GetOperationStatus that we might need to
create to allow for backward compatibility reasons.
# _Not related to above_ : make sure we pass the vertex progress as string (for progress bar
display) and query progress as custom enum for decision making(and implementations on server
side to map from execution engine based state to our generic enum state).
 
If we are too worried about the responsiveness of the progress bar, or *2. in Problem Space*
being a major impediment for hive usage, we should go with the new implementation proposal
else just additionally implement with *5. in Implementation Considerations*



> Progress Bar on Beeline client
> ------------------------------
>
>                 Key: HIVE-15473
>                 URL: https://issues.apache.org/jira/browse/HIVE-15473
>             Project: Hive
>          Issue Type: Improvement
>          Components: Beeline, HiveServer2
>    Affects Versions: 2.1.1
>            Reporter: anishek
>            Assignee: anishek
>            Priority: Minor
>         Attachments: HIVE-15473.2.patch, HIVE-15473.3.patch, HIVE-15473.4.patch, HIVE-15473.5.patch,
screen_shot_beeline.jpg
>
>
> Hive Cli allows showing progress bar for tez execution engine as shown in https://issues.apache.org/jira/secure/attachment/12678767/ux-demo.gif
> it would be great to have similar progress bar displayed when user is connecting via
beeline command line client as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message