Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB8BF10DF1 for ; Thu, 29 Jan 2015 06:42:34 +0000 (UTC) Received: (qmail 51746 invoked by uid 500); 29 Jan 2015 06:42:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 51673 invoked by uid 500); 29 Jan 2015 06:42:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 51662 invoked by uid 500); 29 Jan 2015 06:42:34 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 51659 invoked by uid 99); 29 Jan 2015 06:42:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2015 06:42:34 +0000 Date: Thu, 29 Jan 2015 06:42:34 +0000 (UTC) From: "Prasanth Jayachandran (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-9471) Bad seek in uncompressed ORC, at row-group boundary. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296461#comment-14296461 ] Prasanth Jayachandran commented on HIVE-9471: --------------------------------------------- [~mithun] Yeah. It makes sense suppress the length stream as well if the dictionary is empty. > Bad seek in uncompressed ORC, at row-group boundary. > ---------------------------------------------------- > > Key: HIVE-9471 > URL: https://issues.apache.org/jira/browse/HIVE-9471 > Project: Hive > Issue Type: Bug > Components: File Formats, Serializers/Deserializers > Affects Versions: 0.14.0 > Reporter: Mithun Radhakrishnan > Assignee: Mithun Radhakrishnan > Attachments: HIVE-9471.2.patch, data.txt, orc_bad_seek_failure_case.hive, orc_bad_seek_setup.hive > > > Under at least one specific condition, using index-filters in ORC causes a bad seek into the ORC row-group. > {code:title=stacktrace} > java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for column 2 kind DATA to 0 is outside of the data > at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) > at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) > ... > Caused by: java.lang.IllegalArgumentException: Seek in Stream for column 2 kind DATA to 0 is outside of the data > at org.apache.hadoop.hive.ql.io.orc.InStream$UncompressedStream.seek(InStream.java:112) > at org.apache.hadoop.hive.ql.io.orc.InStream$UncompressedStream.seek(InStream.java:96) > at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:310) > at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.seek(RecordReaderImpl.java:1596) > at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.seek(RecordReaderImpl.java:1337) > at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.seek(RecordReaderImpl.java:1852) > {code} > I'll attach the script to reproduce the problem herewith. -- This message was sent by Atlassian JIRA (v6.3.4#6332)