Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 89FAA20049D for ; Wed, 9 Aug 2017 23:06:41 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 885A916A123; Wed, 9 Aug 2017 21:06:41 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CDE0A16A11F for ; Wed, 9 Aug 2017 23:06:40 +0200 (CEST) Received: (qmail 22139 invoked by uid 500); 9 Aug 2017 21:06:40 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 22130 invoked by uid 99); 9 Aug 2017 21:06:40 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Aug 2017 21:06:40 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id CF60AE382D; Wed, 9 Aug 2017 21:06:39 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: bridgetb@apache.org To: commits@drill.apache.org Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: drill git commit: add info to doc for DRILL-DRILL-5379 Date: Wed, 9 Aug 2017 21:06:39 +0000 (UTC) archived-at: Wed, 09 Aug 2017 21:06:41 -0000 Repository: drill Updated Branches: refs/heads/gh-pages ace75e58f -> 0ce771f06 add info to doc for DRILL-DRILL-5379 Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/0ce771f0 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/0ce771f0 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/0ce771f0 Branch: refs/heads/gh-pages Commit: 0ce771f060419416d51dfe2841b2b79a52220e77 Parents: ace75e5 Author: Bridget Bevens Authored: Wed Aug 9 14:05:17 2017 -0700 Committer: Bridget Bevens Committed: Wed Aug 9 14:05:17 2017 -0700 ---------------------------------------------------------------------- .../040-parquet-format.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/0ce771f0/_docs/data-sources-and-file-formats/040-parquet-format.md ---------------------------------------------------------------------- diff --git a/_docs/data-sources-and-file-formats/040-parquet-format.md b/_docs/data-sources-and-file-formats/040-parquet-format.md index 149e7ac..416e3c3 100644 --- a/_docs/data-sources-and-file-formats/040-parquet-format.md +++ b/_docs/data-sources-and-file-formats/040-parquet-format.md @@ -1,6 +1,6 @@ --- title: "Parquet Format" -date: 2017-03-27 18:38:33 UTC +date: 2017-08-09 21:05:20 UTC parent: "Data Sources and File Formats" --- [Apache Parquet](http://parquet.incubator.apache.org/documentation/latest) has the following characteristics: @@ -51,20 +51,23 @@ Use the `store.format` option to set the CTAS output format of a Parquet row gro Use the ALTER command to set the `store.format` option. -``ALTER SESSION SET `store.format` = 'parquet';`` -``ALTER SYSTEM SET `store.format` = 'parquet';`` +``ALTER SYSTEM|SESSION SET `store.format` = 'parquet';`` ### Configuring the Size of Parquet Files Configuring the size of Parquet files by setting the `store.parquet.block-size` can improve write performance. The block size is the size of MFS, HDFS, or the file system. The larger the block size, the more memory Drill needs for buffering data. Parquet files that contain a single block maximize the amount of data Drill stores contiguously on disk. Given a single row group per file, Drill stores the entire Parquet file onto the block, avoiding network I/O. -To maximize performance, set the target size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system by using the `store.parquet.block-size`: +To maximize performance, set the target size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system using the `store.parquet.block-size` option, as shown: -``ALTER SESSION SET `store.parquet.block-size` = 536870912;`` -``ALTER SYSTEM SET `store.parquet.block-size` = 536870912`` +``ALTER SYSTEM|SESSION SET `store.parquet.block-size` = 536870912;`` -The default block size is 536870912 bytes. +The default block size is 536870912 bytes. + +###Configuring the HDFS Block Size for Parquet Files +Drill 1.11 introduces the `store.parquet.writer.use_single_fs_block` option, which enables Drill to write a Parquet file as a single file system block without changing the default file system block size. Query performance improves when Drill reads Parquet files as a single block on the file system. When the `store.parquet.writer.use_single_fs_block` option is enabled, the `store.parquet.block-size` setting determines the block size of the Parquet files created. The default setting for the `store.parquet.writer.use_single_fs_block` option is 'false'. Use the SET command to enable or disable the option, as shown: + + ALTER SYSTEM|SESSION SET store.parquet.writer.use_single_fs_block = 'true|false'; ### Type Mapping The high correlation between Parquet and SQL data types makes reading Parquet files effortless in Drill. Writing to Parquet files takes more work than reading. Because SQL does not support all Parquet data types, to prevent Drill from inferring a type other than one you want, use the [cast function]({{ site.baseurl }}/docs/data-type-conversion/#cast) Drill offers more liberal casting capabilities than SQL for Parquet conversions if the Parquet data is of a logical type.