Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B883180DE for ; Mon, 23 Nov 2015 21:53:45 +0000 (UTC) Received: (qmail 18576 invoked by uid 500); 23 Nov 2015 21:53:45 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 18505 invoked by uid 500); 23 Nov 2015 21:53:45 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 18015 invoked by uid 99); 23 Nov 2015 21:53:44 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2015 21:53:44 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id CCF82E0AB2; Mon, 23 Nov 2015 21:53:44 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: tshiran@apache.org To: commits@drill.apache.org Date: Mon, 23 Nov 2015 21:54:10 -0000 Message-Id: <66462ac29b91481fa2790717515b2034@git.apache.org> In-Reply-To: <82c75d01717347fb958527b93dec1039@git.apache.org> References: <82c75d01717347fb958527b93dec1039@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [27/30] drill git commit: Update 110-s3-storage-plugin.md Update 110-s3-storage-plugin.md Project: http://git-wip-us.apache.org/repos/asf/drill/repo Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/58f83b75 Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/58f83b75 Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/58f83b75 Branch: refs/heads/gh-pages Commit: 58f83b750fdf3ef1199ba3827e03e998de1cc1c3 Parents: 42505c9 Author: Abhi Authored: Sun Nov 22 23:03:01 2015 -0800 Committer: Tomer Shiran Committed: Mon Nov 23 10:11:58 2015 -0800 ---------------------------------------------------------------------- .../plugins/110-s3-storage-plugin.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/drill/blob/58f83b75/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md ---------------------------------------------------------------------- diff --git a/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md b/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md index 022a6dc..b7cad62 100644 --- a/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md +++ b/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md @@ -42,7 +42,7 @@ Enable S3 storage plugin if you already have one configured or you can add a new You should now be able to talk to data stored on S3 using the S3a library. -## S3 Example +## Example S3 Storage Plugin ``` { @@ -81,4 +81,20 @@ You should now be able to talk to data stored on S3 using the S3a library. } } ``` +## Quering Parquet Format Files On S3 +Drill uses Hadoop FileSystem for reading S3 input files, which in the end uses Apache HttpClient. HttpClient has a default limit of four simultaneous requests, and it puts the subsequent S3 requests in the queue. A Drill query with large number of columns or a Select * query, on Parquet formatted files ends up issuing many S3 requests and can fail with ConnectionPoolTimeoutException. + +Fortunately, as a part of S3a implementation in Hadoop 2.7.1, HttpClient's required limit parameter is extracted out in a config and can be raised to avoid ConnectionPoolTimeoutException. This is how you can set this parameter in conf/core-site.xml file in your Drill install directory: + +``` + + ... + + + fs.s3a.connection.maximum + 100 + + + +```