Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3BCAD200B13 for ; Wed, 15 Jun 2016 18:23:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3AB75160A60; Wed, 15 Jun 2016 16:23:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B33D3160A19 for ; Wed, 15 Jun 2016 18:23:10 +0200 (CEST) Received: (qmail 88535 invoked by uid 500); 15 Jun 2016 16:23:09 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 88508 invoked by uid 99); 15 Jun 2016 16:23:09 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2016 16:23:09 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7FF0E2C1F5C for ; Wed, 15 Jun 2016 16:23:09 +0000 (UTC) Date: Wed, 15 Jun 2016 16:23:09 +0000 (UTC) From: "Jesus Camacho Rodriguez (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-14018) Make IN clause row selectivity estimation customizable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 15 Jun 2016 16:23:11 -0000 [ https://issues.apache.org/jira/browse/HIVE-14018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-14018: ------------------------------------------- Status: Patch Available (was: In Progress) > Make IN clause row selectivity estimation customizable > ------------------------------------------------------ > > Key: HIVE-14018 > URL: https://issues.apache.org/jira/browse/HIVE-14018 > Project: Hive > Issue Type: Improvement > Components: Statistics > Affects Versions: 2.1.0, 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Minor > Attachments: HIVE-14018.patch > > > After HIVE-13287 went in, we calculate IN clause estimates natively (instead of just dividing incoming number of rows by 2). However, as the distribution of values of the columns is considered uniform, we might end up heavily underestimating/overestimating the resulting number of rows. > This issue is to add a factor that multiplies the IN clause estimation so we can alleviate this problem. The solution is not very elegant, but it is the best we can do until we have histograms to improve our estimate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)