Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E307108C9 for ; Mon, 25 Nov 2013 22:04:36 +0000 (UTC) Received: (qmail 34484 invoked by uid 500); 25 Nov 2013 22:04:35 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 34434 invoked by uid 500); 25 Nov 2013 22:04:35 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 34426 invoked by uid 500); 25 Nov 2013 22:04:35 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 34422 invoked by uid 99); 25 Nov 2013 22:04:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Nov 2013 22:04:35 +0000 Date: Mon, 25 Nov 2013 22:04:35 +0000 (UTC) From: "Remus Rusanu (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831968#comment-13831968 ] Remus Rusanu commented on HIVE-5817: ------------------------------------ My patch .4 addresses the issue the following manner: - vector operators can implement optional interface VectorizationContextRegion. If they do, they must provide a new vectorization context to be used by child operators. In my patch only VectorMapJoinOperator does so. - vectorizer walks up the stack of parent nodes to locate the first one (last one?) that created a vectorization context, and this is the vectorization context used to vectorize the current node. At the root of the stack there is a table scan that always creates a vectorization context. - I made the VectorMapJoinOperator build the output VectorizedRowBatch using the VectorizedRowBatchCtx class, same as ORC and RC scanners do. This is more consistent and removes the need for the VectorizedRowBatch.buildBatch method (was used only by VMJ) - add a simplified init to VectorizedRowBatchCtx to be used by VMJ (or any other operator we decide). I did not enable yet 'submit patch' because more code can be removed (the mapper scratch for vector type map) , code that was use donly by VMJ to enable it to build the output batch. Using VectorizedRowBatchCtx makes all that code obsolete. I tested the repro query and passes fine, produces 100 rows (I assume they're the right ones...). I will do some more testing. > column name to index mapping in VectorizationContext is broken > -------------------------------------------------------------- > > Key: HIVE-5817 > URL: https://issues.apache.org/jira/browse/HIVE-5817 > Project: Hive > Issue Type: Bug > Components: Vectorization > Reporter: Sergey Shelukhin > Assignee: Remus Rusanu > Priority: Critical > Attachments: HIVE-5817-uniquecols.broken.patch, HIVE-5817.00-broken.patch, HIVE-5817.4.patch > > > Columns coming from different operators may have the same internal names ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN b ON ... JOIN x ON ...;}} (distilled from a more complex query), which runs ok w/o vectorization. With vectorization, it will run ok for most ca, but for some ca it will fail (or can probably return incorrect results). That is because when building column-to-VRG-index map in VectorizationContext, internal column name for ca that the first map join operator adds to the mapping may be the same as internal name for cb that the 2nd one tries to add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to output stuff, it retrieves wrong index from the map by name, and then wrong vector from VRG. -- This message was sent by Atlassian JIRA (v6.1#6144)