hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-15664) LLAP text cache: improve first query perf
Date Thu, 19 Jan 2017 04:00:29 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-15664:
------------------------------------
    Attachment: HIVE-15664.WIP.patch

This implements 1-2, as well as ORC dictionary.
Skipping is only supported on VectorDeserialize; I started looking at it, should be easy to
do after clearing the initial confusiong - VD doesn't support complex types anyway, so should
be easy to map new ORC cols to original column indexes. 
We don't expect that to result in major gain though (compared to 1-2-4), so I postponed it
for now.
Unfortunately 1 and 2 don't speed it up enough... need to do 4 - return VRBs from VectorDeserialize,
and offload ORC writing to a background thread, I was looking into that today. Need to wrap
my head around variety of array indexes and integer lists that various parts use. Also interface-wise
it would be difficult. Will probably piggyback on Orc...Batch

> LLAP text cache: improve first query perf
> -----------------------------------------
>
>                 Key: HIVE-15664
>                 URL: https://issues.apache.org/jira/browse/HIVE-15664
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>         Attachments: HIVE-15664.WIP.patch
>
>
> 1) Don't use ORC dictionary.
> 2) Use VectorDeserialize.
> 3) Don't parse the columns that are not included (cannot avoid reading them).
> 4) Send VRB to the pipeline and write ORC in parallel (in background).
> Also add an option to disable the encoding pipeline server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message