drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venki Korukanti (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3209) [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists
Date Thu, 01 Oct 2015 23:28:26 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940591#comment-14940591
] 

Venki Korukanti commented on DRILL-3209:
----------------------------------------

Degradation in Hive native scan is due to lower parallelization. Logged DRILL-3884 to fix
the parallelization issue.

> [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the
underlying table format exists
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3209
>                 URL: https://issues.apache.org/jira/browse/DRILL-3209
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization, Storage - Hive
>            Reporter: Jason Altekruse
>            Assignee: Venki Korukanti
>             Fix For: 1.2.0
>
>         Attachments: tpch13-native-scan-off.sys.drill, tpch13-native-scan-on.sys.drill
>
>
> All reads against Hive are currently done through the Hive Serde interface. While this
provides the most flexibility, the API is not optimized for maximum performance while reading
the data into Drill's native data structures. For Parquet and Text file backed tables, we
can plan these reads as Drill native reads. Currently reads of these file types provide untyped
data. While parquet has metadata in the file we currently do not make use of the type information
while planning. For text files we read all of the files as lists of varchars. In both of these
cases, casts will need to be injected to provide the same datatypes provided by the reads
through the SerDe interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message