hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-19103) Nested structure Projection Push Down in Hive with ORC
Date Tue, 22 May 2018 13:33:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-19103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashish Sharma updated HIVE-19103:
---------------------------------
    Component/s: ORC

> Nested structure Projection Push Down in Hive with ORC
> ------------------------------------------------------
>
>                 Key: HIVE-19103
>                 URL: https://issues.apache.org/jira/browse/HIVE-19103
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive, ORC
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 2.3.2
>
>         Attachments: HIVE-19103-0.patch, HIVE-19103-1.patch, HIVE-19103-2.patch
>
>
> Reading required columns only in nested structure schema
> Example - 
> *Current state* - 
> Schema  -  struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
> Query - select c.e.f from t where c.e.f > 10;
> Current state - read entire c struct from the file and then filter because "hive.io.file.readcolumn.ids"
is referred due to which all the children column are select to read from the file.
> Conf -
>      _hive.io.file.readcolumn.ids  = "2"
>      hive.io.file.readNestedColumn.paths = "c.e.f"_
> Result -       
> boolean[ ] include  = [true,false,false,true,true,true,true,true]
> *Expected state* -
> Schema  -  struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
> Query - select c.e.f from t where c.e.f > 10;
> Expected state - instead of reading entire c struct from the file just read only the
f column by referring the  " hive.io.file.readNestedColumn.paths".
> Conf -
>      _hive.io.file.readcolumn.ids  = "2"
>      hive.io.file.readNestedColumn.paths = "c.e.f"_
> Result -       
> boolean[ ] include  = [true,false,false,true,false,true,true,false]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message