Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6135107E2 for ; Mon, 29 Jul 2013 16:33:51 +0000 (UTC) Received: (qmail 82015 invoked by uid 500); 29 Jul 2013 16:33:50 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 81764 invoked by uid 500); 29 Jul 2013 16:33:50 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 81738 invoked by uid 500); 29 Jul 2013 16:33:50 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 81735 invoked by uid 99); 29 Jul 2013 16:33:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jul 2013 16:33:50 +0000 Date: Mon, 29 Jul 2013 16:33:50 +0000 (UTC) From: "Brock Noland (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4838: ------------------------------- Attachment: HIVE-4838.patch Rebased > Refactor MapJoin HashMap code to improve testability and readability > -------------------------------------------------------------------- > > Key: HIVE-4838 > URL: https://issues.apache.org/jira/browse/HIVE-4838 > Project: Hive > Issue Type: Bug > Reporter: Brock Noland > Assignee: Brock Noland > Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch > > > MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: > * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. > * The api of a logical "Table Container" is not defined and therefore it's unclear what apis HashMapWrapper > needs to publicize. Additionally HashMapWrapper has many used public methods. > * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated > * HashTableSinkObjectCtx has unused fields and unused methods > * CommonJoinOperator and children use ArrayList on left hand side when only List is required > * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira