Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AD9D3200BF3 for ; Thu, 22 Dec 2016 01:10:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id AC348160B3A; Thu, 22 Dec 2016 00:10:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 272B1160B26 for ; Thu, 22 Dec 2016 01:10:04 +0100 (CET) Received: (qmail 62699 invoked by uid 500); 22 Dec 2016 00:09:58 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 62684 invoked by uid 99); 22 Dec 2016 00:09:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Dec 2016 00:09:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6DFA22C1F54 for ; Thu, 22 Dec 2016 00:09:58 +0000 (UTC) Date: Thu, 22 Dec 2016 00:09:58 +0000 (UTC) From: "Chao Sun (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-15477) Provide options to adjust filter stats when column stats are not available MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Dec 2016 00:10:04 -0000 [ https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15477: ---------------------------- Attachment: HIVE-15477.2.patch > Provide options to adjust filter stats when column stats are not available > -------------------------------------------------------------------------- > > Key: HIVE-15477 > URL: https://issues.apache.org/jira/browse/HIVE-15477 > Project: Hive > Issue Type: Bug > Components: Statistics > Affects Versions: 2.2.0 > Reporter: Chao Sun > Assignee: Chao Sun > Attachments: HIVE-15477.1.patch, HIVE-15477.2.patch > > > Currently when column stats are not available, Hive will assume the "worst" case by setting the # of output rows to be 1/2 of the # of input rows, for each predicate expression. This could be inaccurate, especially in the presence of multiple predicates chained by AND. We have found in some cases this could cause map join to have wrong ordering and thus fail with memory issue. > One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) that can be used to control the percentage of rows emitted by a predicate expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)