From issues-return-48713-archive-asf-public=cust-asf.ponee.io@drill.apache.org Thu Jan 11 21:22:10 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 9240A180656 for ; Thu, 11 Jan 2018 21:22:10 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 8203D160C23; Thu, 11 Jan 2018 20:22:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C97B5160C13 for ; Thu, 11 Jan 2018 21:22:09 +0100 (CET) Received: (qmail 17664 invoked by uid 500); 11 Jan 2018 20:22:09 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 17655 invoked by uid 99); 11 Jan 2018 20:22:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jan 2018 20:22:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 05D6AC1BDB for ; Thu, 11 Jan 2018 20:22:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.711 X-Spam-Level: X-Spam-Status: No, score=-100.711 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id qgmbcrEwQJHk for ; Thu, 11 Jan 2018 20:22:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 57A865F47E for ; Thu, 11 Jan 2018 20:22:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id DB1A5E25EA for ; Thu, 11 Jan 2018 20:22:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id D59D7255DB for ; Thu, 11 Jan 2018 20:22:00 +0000 (UTC) Date: Thu, 11 Jan 2018 20:22:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5846) Improve Parquet Reader Performance for Flat Data types MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322887#comment-16322887 ] ASF GitHub Bot commented on DRILL-5846: --------------------------------------- Github user sachouche commented on a diff in the pull request: https://github.com/apache/drill/pull/1060#discussion_r161039122 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -874,6 +880,46 @@ public void setSafe(int index, BigDecimal value) { set(index, value); } + /** + * Copies the bulk input into this value vector and extends its capacity if necessary. + * @param input bulk input + */ + public void setSafe(VLBulkInput input) { + setSafe(input, null); + } + + /** + * Copies the bulk input into this value vector and extends its capacity if necessary. The callback + * mechanism allows decoration as caller is invoked for each bulk entry. + * + * @param input bulk input + * @param callback a bulk input callback object (optional) + */ + public void setSafe(VLBulkInput input, VLBulkInput.BulkInputCallback callback) { --- End diff -- This code is not Parquet specific. Instead, it can be triggered by any Reader which desires to load data in a bulk fashion. Vectors currently expose Mutator APIs for loading single values; I see no good reason which prevent us from passing bulk values instead of a single one at a time which prevent us from code optimization. Look at ByBuffer APIs they allow you to pass single byte values but also byte arrays. > Improve Parquet Reader Performance for Flat Data types > ------------------------------------------------------- > > Key: DRILL-5846 > URL: https://issues.apache.org/jira/browse/DRILL-5846 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet > Affects Versions: 1.11.0 > Reporter: salim achouche > Assignee: salim achouche > Labels: performance > Fix For: 1.13.0 > > > The Parquet Reader is a key use-case for Drill. This JIRA is an attempt to further improve the Parquet Reader performance as several users reported that Parquet parsing represents the lion share of the overall query execution. It tracks Flat Data types only as Nested DTs might involve functional and processing enhancements (e.g., a nested column can be seen as a Document; user might want to perform operations scoped at the document level that is no need to span all rows). Another JIRA will be created to handle the nested columns use-case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)