Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77EA7108E2 for ; Sat, 15 Mar 2014 22:32:46 +0000 (UTC) Received: (qmail 98103 invoked by uid 500); 15 Mar 2014 22:32:44 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 97994 invoked by uid 500); 15 Mar 2014 22:32:44 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 97982 invoked by uid 500); 15 Mar 2014 22:32:43 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 97970 invoked by uid 99); 15 Mar 2014 22:32:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Mar 2014 22:32:43 +0000 Date: Sat, 15 Mar 2014 22:32:43 +0000 (UTC) From: "Lefty Leverenz (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-3777) add a property in the partition to figure out if stats are accurate MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-3777?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13936= 327#comment-13936327 ]=20 Lefty Leverenz commented on HIVE-3777: -------------------------------------- Documented in the wiki's Configuration Properties, please review: {quote} hive.stats.reliable Default Value: false Added In: Hive 0.10.0 with HIVE-1653 New Behavior In: Hive 0.13.0 with HIVE-3777 Whether queries will fail because statistics cannot be collected completely= accurately. If this is set to true, reading/writing from/into a partition = or unpartitioned table may fail because the statistics could not be compute= d accurately. If it is set to false, the operation will succeed. In Hive 0.13.0 and later=EF=BB=BF, if hive.stats.reliable is false and stat= istics could not be computed correctly, the operation can still succeed and= update the statistics but it sets a partition property "areStatsAccurate" = to false. If the application needs accurate statistics, they can then be ob= tained in the background. {quote} Questions:=20 # Does an unpartitioned table have the "areStatsAccurate" property too? # Does the new behavior happen when hive.stats.reliable is false, not true?= (I ask because the jira description implies that this is a fix for the pr= oblem of long-running queries failing when statistics aren't accurate, but = as I understand it the query doesn't fail when hive.stats.reliable is false= . Perhaps I'm confused, so please make sure the wikidoc is correct.) Quick ref: * [Language Manual -- Configuration Properties: hive.stats.reliable |https= ://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#Config= urationProperties-hive.stats.reliable] > add a property in the partition to figure out if stats are accurate > ------------------------------------------------------------------- > > Key: HIVE-3777 > URL: https://issues.apache.org/jira/browse/HIVE-3777 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 0.13.0 > Reporter: Namit Jain > Assignee: Ashutosh Chauhan > Fix For: 0.13.0 > > Attachments: HIVE-3777.2.patch, HIVE-3777.2.patch, HIVE-3777.3.pa= tch, HIVE-3777.4.patch, HIVE-3777.5.patch, HIVE-3777.patch > > > Currently, stats task tries to update the statistics in the table/partiti= on > being updated after the table/partition is loaded. In case of a failure t= o=20 > update these stats (due to the any reason), the operation either succeeds > (writing inaccurate stats) or fails depending on whether hive.stats.relia= ble > is set to true. This can be bad for applications who do not always care a= bout > reliable stats, since the query may have taken a long time to execute and= then > fail eventually. > Another property should be added to the partition: areStatsAccurate. If h= ive.stats.reliable is > set to false, and stats could not be computed correctly, the operation wo= uld > still succeed, update the stats, but set areStatsAccurate to false. > If the application cares about accurate stats, it can be obtained in the= =20 > background. -- This message was sent by Atlassian JIRA (v6.2#6252)