hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-11033) BloomFilter index is not honored by ORC reader
Date Thu, 18 Jun 2015 08:04:01 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591438#comment-14591438
] 

Prasanth Jayachandran edited comment on HIVE-11033 at 6/18/15 8:03 AM:
-----------------------------------------------------------------------

Minor change to include all bloom filter columns.


was (Author: prasanth_j):
Minor change to change to include all bloom filter columns.

> BloomFilter index is not honored by ORC reader
> ----------------------------------------------
>
>                 Key: HIVE-11033
>                 URL: https://issues.apache.org/jira/browse/HIVE-11033
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Allan Yan
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-11033.2.patch, HIVE-11033.patch
>
>
> There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class which caused
the bloom filter index saved in the ORC file not being used. The root cause is the bloomFilterIndices
variable defined in the SargApplier class superseded the one defined in its parent class.
Therefore, in the ReaderImpl.pickRowGroups()
> {code}
>   protected boolean[] pickRowGroups() throws IOException {
>     // if we don't have a sarg or indexes, we read everything
>     if (sargApp == null) {
>       return null;
>     }
>     readRowIndex(currentStripe, included, sargApp.sargColumns);
>     return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
>   }
> {code}
> The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp object.
One solution is to make SargApplier.bloomFilterIndices a reference to its parent counterpart.
> {noformat}
> 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
> 174d173
> <     bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
> 178c177
> <           sarg, options.getColumnNames(), strideRate, types, included.length, bloomFilterIndices);
> ---
> >           sarg, options.getColumnNames(), strideRate, types, included.length);
> 204a204
> >     bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
> 673c673
> <         List<OrcProto.Type> types, int includedCount, OrcProto.BloomFilterIndex[]
bloomFilterIndices) {
> ---
> >         List<OrcProto.Type> types, int includedCount) {
> 677c677
> <       this.bloomFilterIndices = bloomFilterIndices;
> ---
> >       bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message