hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <>
Subject [jira] [Commented] (HIVE-3777) add a property in the partition to figure out if stats are accurate
Date Sat, 15 Mar 2014 22:32:43 GMT


Lefty Leverenz commented on HIVE-3777:

Documented in the wiki's Configuration Properties, please review:

Default Value: false
Added In: Hive 0.10.0 with HIVE-1653
New Behavior In:  Hive 0.13.0 with HIVE-3777

Whether queries will fail because statistics cannot be collected completely accurately. If
this is set to true, reading/writing from/into a partition or unpartitioned table may fail
because the statistics could not be computed accurately. If it is set to false, the operation
will succeed.

In Hive 0.13.0 and later´╗┐, if hive.stats.reliable is false and statistics could not be computed
correctly, the operation can still succeed and update the statistics but it sets a partition
property "areStatsAccurate" to false. If the application needs accurate statistics, they can
then be obtained in the background.


# Does an unpartitioned table have the "areStatsAccurate" property too?
# Does the new behavior happen when hive.stats.reliable is false, not true?  (I ask because
the jira description implies that this is a fix for the problem of long-running queries failing
when statistics aren't accurate, but as I understand it the query doesn't fail when hive.stats.reliable
is false.  Perhaps I'm confused, so please make sure the wikidoc is correct.)

Quick ref:
* [Language Manual -- Configuration Properties:  hive.stats.reliable |]

> add a property in the partition to figure out if stats are accurate
> -------------------------------------------------------------------
>                 Key: HIVE-3777
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Namit Jain
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.13.0
>         Attachments: HIVE-3777.2.patch, HIVE-3777.2.patch, HIVE-3777.3.patch, HIVE-3777.4.patch,
HIVE-3777.5.patch, HIVE-3777.patch
> Currently, stats task tries to update the statistics in the table/partition
> being updated after the table/partition is loaded. In case of a failure to 
> update these stats (due to the any reason), the operation either succeeds
> (writing inaccurate stats) or fails depending on whether hive.stats.reliable
> is set to true. This can be bad for applications who do not always care about
> reliable stats, since the query may have taken a long time to execute and then
> fail eventually.
> Another property should be added to the partition: areStatsAccurate. If hive.stats.reliable
> set to false, and stats could not be computed correctly, the operation would
> still succeed, update the stats, but set areStatsAccurate to false.
> If the application cares about accurate stats, it can be obtained in the 
> background.

This message was sent by Atlassian JIRA

View raw message