hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3777) add a property in the partition to figure out if stats are accurate
Date Sat, 15 Mar 2014 22:32:43 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936327#comment-13936327
] 

Lefty Leverenz commented on HIVE-3777:
--------------------------------------

Documented in the wiki's Configuration Properties, please review:

{quote}
hive.stats.reliable
Default Value: false
Added In: Hive 0.10.0 with HIVE-1653
New Behavior In:  Hive 0.13.0 with HIVE-3777

Whether queries will fail because statistics cannot be collected completely accurately. If
this is set to true, reading/writing from/into a partition or unpartitioned table may fail
because the statistics could not be computed accurately. If it is set to false, the operation
will succeed.

In Hive 0.13.0 and later´╗┐, if hive.stats.reliable is false and statistics could not be computed
correctly, the operation can still succeed and update the statistics but it sets a partition
property "areStatsAccurate" to false. If the application needs accurate statistics, they can
then be obtained in the background.
{quote}

Questions: 

# Does an unpartitioned table have the "areStatsAccurate" property too?
# Does the new behavior happen when hive.stats.reliable is false, not true?  (I ask because
the jira description implies that this is a fix for the problem of long-running queries failing
when statistics aren't accurate, but as I understand it the query doesn't fail when hive.stats.reliable
is false.  Perhaps I'm confused, so please make sure the wikidoc is correct.)

Quick ref:
* [Language Manual -- Configuration Properties:  hive.stats.reliable |https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.reliable]

> add a property in the partition to figure out if stats are accurate
> -------------------------------------------------------------------
>
>                 Key: HIVE-3777
>                 URL: https://issues.apache.org/jira/browse/HIVE-3777
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Namit Jain
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.13.0
>
>         Attachments: HIVE-3777.2.patch, HIVE-3777.2.patch, HIVE-3777.3.patch, HIVE-3777.4.patch,
HIVE-3777.5.patch, HIVE-3777.patch
>
>
> Currently, stats task tries to update the statistics in the table/partition
> being updated after the table/partition is loaded. In case of a failure to 
> update these stats (due to the any reason), the operation either succeeds
> (writing inaccurate stats) or fails depending on whether hive.stats.reliable
> is set to true. This can be bad for applications who do not always care about
> reliable stats, since the query may have taken a long time to execute and then
> fail eventually.
> Another property should be added to the partition: areStatsAccurate. If hive.stats.reliable
is
> set to false, and stats could not be computed correctly, the operation would
> still succeed, update the stats, but set areStatsAccurate to false.
> If the application cares about accurate stats, it can be obtained in the 
> background.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message