Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC194D577 for ; Wed, 17 Oct 2012 21:42:05 +0000 (UTC) Received: (qmail 42875 invoked by uid 500); 17 Oct 2012 21:42:04 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 42802 invoked by uid 500); 17 Oct 2012 21:42:04 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 42706 invoked by uid 500); 17 Oct 2012 21:42:04 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 42701 invoked by uid 99); 17 Oct 2012 21:42:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 21:42:04 +0000 Date: Wed, 17 Oct 2012 21:42:04 +0000 (UTC) From: "Kevin Wilfong (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <1731163308.60501.1350510124782.JavaMail.jiratomcat@arcas> In-Reply-To: <1414843623.60484.1350510004576.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HIVE-3593) Output files of SMB join grow indefinitely MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478419#comment-13478419 ] Kevin Wilfong commented on HIVE-3593: ------------------------------------- Mildly related, if only one partition of the big table is used as input to the SMB join, there is no need to prefix the file name with the partition spec. > Output files of SMB join grow indefinitely > ------------------------------------------ > > Key: HIVE-3593 > URL: https://issues.apache.org/jira/browse/HIVE-3593 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.10.0 > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > > The output files of a SMB join are prefixed by the big table's partition spec that was used to create them. The length of the bucket number portion of the file name is updated to be the same length as the length of the task ID. Since the task ID is the name of the file, this means that if the output of a SMB join is used as the big table of another SMB join, the output files will increase by the size of the original partition spec. Compound this and the file size can grow indefinitely. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira