phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4674) Incorrect stats if data size is less than guidepost width
Date Thu, 26 Apr 2018 00:08:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453267#comment-16453267
] 

James Taylor commented on PHOENIX-4674:
---------------------------------------

Thanks for the test, [~abhishek.chouhan]. I tweaked it slightly - the current behavior is
working as designed. The statistics reported are meant to be an upper bound of the amount
of data scanned. In this case, statistics have been collected, but we know we have less than
a guideposts width. So we use the guideposts width as the bytes scanned and estimate the row
count based on our row width estimate. We could use 0 as the estimate of bytes/rows scanned,
but the disadvantage would be if a very large guidepost width is configured, there actually
may be a sizeable amount of data to scan (and the user would be given no indication of that).

> Incorrect stats if data size is less than guidepost width
> ---------------------------------------------------------
>
>                 Key: PHOENIX-4674
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4674
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Abhishek Singh Chouhan
>            Assignee: Abhishek Singh Chouhan
>            Priority: Major
>         Attachments: PHOENIX-4674.patch
>
>
> For a small table, lets say with a single region < guidepost width, the stats after
running update statistics can be way off. This is because we get an empty guidepost for the
region and in BaseResultIterators we end up estimating the #rows as guidepostwidth/estimated
row size of the table. For a table having <100 rows and guidepost width size of 100 mb,
if the estimated row size is 100 bytes we end up estimating a million rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message