Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B73A6E252 for ; Fri, 7 Dec 2012 04:43:21 +0000 (UTC) Received: (qmail 79490 invoked by uid 500); 7 Dec 2012 04:43:21 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 79459 invoked by uid 500); 7 Dec 2012 04:43:21 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 79443 invoked by uid 500); 7 Dec 2012 04:43:20 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 79434 invoked by uid 99); 7 Dec 2012 04:43:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Dec 2012 04:43:20 +0000 Date: Fri, 7 Dec 2012 04:43:20 +0000 (UTC) From: "Namit Jain (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-3777) add a property in the partition to figure out if stats are accurate MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3777: ----------------------------- Description: Currently, stats task tries to update the statistics in the table/partition being updated after the table/partition is loaded. In case of a failure to update these stats (due to the any reason), the operation either succeeds (writing inaccurate stats) or fails depending on whether hive.stats.reliable is set to true. This can be bad for applications who do not always care about reliable stats, since the query may have taken a long time to execute and then fail eventually. Another property should be added to the partition: areStatsAccurate. If hive.stats.reliable is set to false, and stats could not be computed correctly, the operation would still succeed, update the stats, but set areStatsAccurate to false. If the application cares about accurate stats, it can be obtained in the background. was: Currently, stats task tries to update the statistics in the table/partition being updated after the table/partition is loaded. In case of a failure to update these stats (due to the any reason), the operation either succeeds (writing inaccurate stats) or fails depending on whether hive.stats.reliable is set to true. This can be bad for applications who do not always care about reliable stats, since the query may have taken a long time to execute and then fail eventually. Another option should be added: hive.accurate.stats. If hive.stats.reliable is set to false, and stats could not be computed correctly, the operation would still succeed, update the stats, but set hive.accurate.stats to false. If the application cares about accurate stats, it can be obtained in the background. Summary: add a property in the partition to figure out if stats are accurate (was: add hive.stats.accurate in the partition) > add a property in the partition to figure out if stats are accurate > ------------------------------------------------------------------- > > Key: HIVE-3777 > URL: https://issues.apache.org/jira/browse/HIVE-3777 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Namit Jain > > Currently, stats task tries to update the statistics in the table/partition > being updated after the table/partition is loaded. In case of a failure to > update these stats (due to the any reason), the operation either succeeds > (writing inaccurate stats) or fails depending on whether hive.stats.reliable > is set to true. This can be bad for applications who do not always care about > reliable stats, since the query may have taken a long time to execute and then > fail eventually. > Another property should be added to the partition: areStatsAccurate. If hive.stats.reliable is > set to false, and stats could not be computed correctly, the operation would > still succeed, update the stats, but set areStatsAccurate to false. > If the application cares about accurate stats, it can be obtained in the > background. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira