Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 592CF183EF for ; Thu, 1 Oct 2015 19:14:27 +0000 (UTC) Received: (qmail 94680 invoked by uid 500); 1 Oct 2015 19:14:27 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 94651 invoked by uid 500); 1 Oct 2015 19:14:27 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 94641 invoked by uid 99); 1 Oct 2015 19:14:27 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2015 19:14:27 +0000 Date: Thu, 1 Oct 2015 19:14:27 +0000 (UTC) From: "Chun Chang (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-3209) [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Chang updated DRILL-3209: ------------------------------ Attachment: tpch13-native-scan-on.sys.drill tpch13-native-scan-off.sys.drill > [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists > ---------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-3209 > URL: https://issues.apache.org/jira/browse/DRILL-3209 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - Hive > Reporter: Jason Altekruse > Assignee: Venki Korukanti > Fix For: 1.2.0 > > Attachments: tpch13-native-scan-off.sys.drill, tpch13-native-scan-on.sys.drill > > > All reads against Hive are currently done through the Hive Serde interface. While this provides the most flexibility, the API is not optimized for maximum performance while reading the data into Drill's native data structures. For Parquet and Text file backed tables, we can plan these reads as Drill native reads. Currently reads of these file types provide untyped data. While parquet has metadata in the file we currently do not make use of the type information while planning. For text files we read all of the files as lists of varchars. In both of these cases, casts will need to be injected to provide the same datatypes provided by the reads through the SerDe interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)