drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tshi...@apache.org
Subject [27/30] drill git commit: Update 110-s3-storage-plugin.md
Date Mon, 23 Nov 2015 21:54:10 GMT
Update 110-s3-storage-plugin.md


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/58f83b75
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/58f83b75
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/58f83b75

Branch: refs/heads/gh-pages
Commit: 58f83b750fdf3ef1199ba3827e03e998de1cc1c3
Parents: 42505c9
Author: Abhi <abhipol@users.noreply.github.com>
Authored: Sun Nov 22 23:03:01 2015 -0800
Committer: Tomer Shiran <tshiran@gmail.com>
Committed: Mon Nov 23 10:11:58 2015 -0800

----------------------------------------------------------------------
 .../plugins/110-s3-storage-plugin.md              | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/58f83b75/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md b/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md
index 022a6dc..b7cad62 100644
--- a/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md
+++ b/_docs/connect-a-data-source/plugins/110-s3-storage-plugin.md
@@ -42,7 +42,7 @@ Enable S3 storage plugin if you already have one configured or you can add
a new
 
 You should now be able to talk to data stored on S3 using the S3a library.
 
-## S3 Example
+## Example S3 Storage Plugin
 
 ```
 {
@@ -81,4 +81,20 @@ You should now be able to talk to data stored on S3 using the S3a library.
   }
 }
 ```
+## Quering Parquet Format Files On S3 
 
+Drill uses Hadoop FileSystem for reading S3 input files, which in the end uses Apache HttpClient.
HttpClient has a default limit of four simultaneous requests, and it puts the subsequent S3
requests in the queue. A Drill query with large number of columns or a Select * query, on
Parquet formatted files ends up issuing many S3 requests and can fail with ConnectionPoolTimeoutException.
  
+
+Fortunately, as a part of S3a implementation in Hadoop 2.7.1, HttpClient's required limit
parameter is extracted out in a config and can be raised to avoid ConnectionPoolTimeoutException.
This is how you can set this parameter in conf/core-site.xml file in your Drill install directory:
+
+```
+<configuration>
+  ...
+  
+  <property>
+    <name>fs.s3a.connection.maximum</name>
+    <value>100</value>
+  </property>
+
+</configuration>
+```


Mime
View raw message