Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C0684200B5E for ; Wed, 27 Jul 2016 07:10:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BEF34160AA4; Wed, 27 Jul 2016 05:10:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 14787160AA6 for ; Wed, 27 Jul 2016 07:10:21 +0200 (CEST) Received: (qmail 77215 invoked by uid 500); 27 Jul 2016 05:10:20 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 77148 invoked by uid 99); 27 Jul 2016 05:10:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2016 05:10:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 91D772C0D5E for ; Wed, 27 Jul 2016 05:10:20 +0000 (UTC) Date: Wed, 27 Jul 2016 05:10:20 +0000 (UTC) From: "Illya Yalovyy (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 27 Jul 2016 05:10:22 -0000 [ https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-7239: -------------------------------- Status: Patch Available (was: Open) > Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files > -------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-7239 > URL: https://issues.apache.org/jira/browse/HIVE-7239 > Project: Hive > Issue Type: Bug > Components: Indexing > Affects Versions: 2.1.0 > Reporter: Sumit Kumar > Assignee: Illya Yalovyy > Attachments: HIVE-7239.2.patch, HIVE-7239.3.patch, HIVE-7239.4.patch, HIVE-7239.patch > > > In case of sequence files, it's crucial that splits are calculated around the boundaries enforced by the input sequence file. However by default hadoop creates input splits depending on the configuration parameters which may not match the boundaries for the input sequence file. Hive provides HiveIndexedInputFormat that provides extra logic and recalculates the split boundaries for each split depending on the sequence file's boundaries. > However we noticed this behavior of "over" reporting from data backed by sequence file. We've a sample data on which we experimented and fixed this bug, we have verified this fix by comparing the query output for input being sequence file format, rc file and regular format. However we have not able to find the right place to include this as a unit test that would execute as part of hive tests. We tried writing a "clientpositive" test as part of ql module but the output seems quite verbose and i couldn't interpret it that well. Can someone please review this change and guide on how to write a test that will execute as part of Hive testing? -- This message was sent by Atlassian JIRA (v6.3.4#6332)