hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Panagiotis Garefalakis (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-22959) Extend storage-api to expose FilterContext
Date Mon, 30 Mar 2020 16:45:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-22959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071127#comment-17071127
] 

Panagiotis Garefalakis edited comment on HIVE-22959 at 3/30/20, 4:44 PM:
-------------------------------------------------------------------------

Hey [~omalley] – the idea here is to abstract the information needed (by data-format consumers)
to enable more fine-grained filtering (e.g., ORC-577)

You are right, VRB does contains similar information but the problem is not all consumers
make use of VRB — for example in Hive we are currently using Batches of [ColumnVectors]([https://github.com/apache/hive/blob/aa94b8d5cefc332c7269a0d8857a9778b9fe1b0c/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java])
instead.
  
 The proposed MutableFilterContext also provides some optimizations like the  borrowSelected
method to reuse the allocated selected array across filters and exposes a immutable context
by default to make it harder for API users to modify the context values when they shouldn't.


was (Author: pgaref):
Hey [~omalley] – the idea here is to abstract the information needed (by data-format consumers)
to enable more fine-grained filtering (e.g., ORC-611)

You are right, VRB does contains similar information but the problem is not all consumers
make use of VRB — for example in Hive we are currently using Batches of [ColumnVectors]([https://github.com/apache/hive/blob/aa94b8d5cefc332c7269a0d8857a9778b9fe1b0c/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java])
instead.
  
 The proposed MutableFilterContext also provides some optimizations like the  borrowSelected
method to reuse the allocated selected array across filters and exposes a immutable context
by default to make it harder for API users to modify the context values when they shouldn't.

> Extend storage-api to expose FilterContext
> ------------------------------------------
>
>                 Key: HIVE-22959
>                 URL: https://issues.apache.org/jira/browse/HIVE-22959
>             Project: Hive
>          Issue Type: Sub-task
>          Components: storage-api
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0, storage-2.7.2
>
>         Attachments: HIVE-22959.1.patch, HIVE-22959.2.patch, HIVE-22959.3.patch, HIVE-22959.4.patch
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> To enable row-level filtering at the ORC level ORC-577, or as an extension ProDecode
MapJoin HIVE-22731 we need a common context class that will hold all the needed information
for the filter.
> I propose this class to be part of the storage-api – similar to VectorizedRowBatch
class and hold the information below:
>  * A boolean variable showing if the filter is enabled
>  * A int array storing the row Ids that are actually selected (passing the filter)
>  * An int variable storing the the number or rows that passed the filter
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message