Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67532109FB for ; Thu, 3 Oct 2013 14:17:59 +0000 (UTC) Received: (qmail 42131 invoked by uid 500); 3 Oct 2013 14:17:58 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 42074 invoked by uid 500); 3 Oct 2013 14:17:58 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 42057 invoked by uid 99); 3 Oct 2013 14:17:57 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Oct 2013 14:17:57 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 06F4E1C9A4C; Thu, 3 Oct 2013 14:17:56 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8617393176908496486==" MIME-Version: 1.0 Subject: Re: Review Request 13059: HIVE-4850 Implement vector mode map join From: "Remus Rusanu" To: "Jitendra Pandey" , "Eric Hanson" Cc: "Remus Rusanu" , "hive" Date: Thu, 03 Oct 2013 14:17:56 -0000 Message-ID: <20131003141756.28588.44905@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Remus Rusanu" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/13059/ X-Sender: "Remus Rusanu" References: <20130730111118.3449.4095@reviews.apache.org> In-Reply-To: <20130730111118.3449.4095@reviews.apache.org> Reply-To: "Remus Rusanu" --===============8617393176908496486== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13059/ ----------------------------------------------------------- (Updated Oct. 3, 2013, 2:17 p.m.) Review request for hive, Eric Hanson and Jitendra Pandey. Bugs: HIVE-4850 https://issues.apache.org/jira/browse/HIVE-4850 Repository: hive-git Description ------- This is not the final iteration, but I thought is easier to discuss it with a review. This implementation works, handles multiple aliases and multiple values per key. The implementation uses the exiting hash tables saved by the local task for the map join, which are row mode hash tables (have row mode keys and store row mode writable object values). Going forward we should avoid the size-of-big-table conversions of big table keys to row-mode and conversion of small table values to vector data. This would require either converting on-the-fly the hash tables to vector friendly ones (when loaded) or changing the local task tahstable sink to create a vectorization friendly hash. First approach may have memory consumption problems (potentially two hash tables end up in memory, would have to stream the transformation or transform as reading from serialized format... nasty). Diffs (updated) ----- ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47 ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044 ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395 ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 02ebe14 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java df1c5a6 ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b Diff: https://reviews.apache.org/r/13059/diff/ Testing ------- Manually run some join queries on alltypes_orc table. Thanks, Remus Rusanu --===============8617393176908496486==--