Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 06C2C1835D for ; Fri, 5 Feb 2016 01:34:40 +0000 (UTC) Received: (qmail 40042 invoked by uid 500); 5 Feb 2016 01:34:39 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 40019 invoked by uid 500); 5 Feb 2016 01:34:39 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 40000 invoked by uid 99); 5 Feb 2016 01:34:39 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Feb 2016 01:34:39 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C286B2C14F0 for ; Fri, 5 Feb 2016 01:34:39 +0000 (UTC) Date: Fri, 5 Feb 2016 01:34:39 +0000 (UTC) From: "Charles Pritchard (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7148) Use murmur hash to create bucketed tables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133477#comment-15133477 ] Charles Pritchard commented on HIVE-7148: ----------------------------------------- I could really use custom bucketing functions, as I want to use buckets instead of partitions based on a derived value. > Use murmur hash to create bucketed tables > ----------------------------------------- > > Key: HIVE-7148 > URL: https://issues.apache.org/jira/browse/HIVE-7148 > Project: Hive > Issue Type: Bug > Reporter: Gunther Hagleitner > > HIVE-7121 introduced murmur hashing for queries that don't insert into bucketed tables. This was done to achieve better distribution of the data. The same should be done for bucketed tables as well, but this involves making sure we don't break backwards compat. This probably means that we have to store the partitioning function used in the metadata and use that to determine if SMB and bucketed map-join optimizations apply. -- This message was sent by Atlassian JIRA (v6.3.4#6332)