drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query
Date Fri, 03 May 2019 17:58:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832702#comment-16832702
] 

ASF GitHub Bot commented on DRILL-7222:
---------------------------------------

kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual row counts for
a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-489185185
 
 
   @arina-ielchiieva 
   
   The motivation for this PR comes from the need for engineers to analyze queries as plans
change due to introduction of statistics. An initial thought was to add an additional column,
but, I think, we already have a lot of columns. I've tried to figure which columns to trim,
but almost all seem relevant. I know we might come back to doing similar things with Resource
Management as well, where we'll again need to work on estimates vs actual. So adding additional
columns is not practical.
   
   Showing the estimates based on whether a planning decision was made using statistics is
not possible unless the profile JSON itself carries some hint that statistics were used.
   
   Also, I added the toggle button to provide a mechanism to hide the estimates by default
(another reason why not an additional column). I'm worried that users will get the impression
that there are issues with Drill because of estimates being wildly off. Even if they are sufficiently
accurate (like NDV-based estimates vs actual), most users don't have the insight into how
the stats are being used.
   
   Users who have insight into such things can make use of the estimates to tune parameters
(e.g. broadcast or selectivity thresholds) to force changes in plans that are sub-optimal.
Based on this, I thought we should go with the parenthesis option for showing the estimated
row counts.  
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Visualize estimated and actual row counts for a query
> -----------------------------------------------------
>
>                 Key: DRILL-7222
>                 URL: https://issues.apache.org/jira/browse/DRILL-7222
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Web Server
>    Affects Versions: 1.16.0
>            Reporter: Kunal Khatua
>            Assignee: Kunal Khatua
>            Priority: Major
>              Labels: doc-impacting, user-experience
>             Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount along side
the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message