drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Bünger (JIRA) <j...@apache.org>
Subject [jira] [Updated] (DRILL-5216) Make use of FetchSize in JDBC storage plugin
Date Wed, 25 Jan 2017 04:03:26 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Bünger updated DRILL-5216:
---------------------------------
    Description: 
For remotely located database servers it usually is preferred to increase the fetch size for
selected or all queries to limit number of roundtrips.

This could either be controlled via plugin config or on a query basis via comments.

One has also to take metadata retrieval into account, as in larger scenarios - as in mine
- the Oracle cluster hosts thousands of schemas and the small fetchsize results in hundres
of individual roundtrips.
In the end every Drill query against this storage takes at least a minute justs for querying
the metadata.

Analysis so far for metadata-retrieving SQL calls:
So far, Drill is using the JDBC metadata API {{java.sql.DatabaseMetaData.getSchemas()}} inside
JdbcStoragePlugin.java and could set an appropriate fetchsize before iterating the result
set.
I've tested this locally and improved latency a lot, but am note sure how this affects other
non-oracle JDBC drivers.

The other (potentially long) query is the table enumeration.
>From what I've seen is Drill not calling the JDBC driver directly, but goes through apache.calcite
calling {{getTableNames()}} which under the hood calls {{java.sql.DatabaseMetaData.getTables()}}
and also contributes to slow metadata retrieval due to small default fetch size.

  was:
The metadata retrieval uses the default fetchsize for the underlying JDBC driver, which in
case of Oracle is only 10.
In larger scenarios - as in mine - the Oracle cluster hosts thousands of schemas and the small
fetchsize results in hundres of individual roundtrips.
In the end every Drill query against this storage takes at least a minute (server is remote)

So far, Drill is using the JDBC metadata API {{java.sql.DatabaseMetaData.getSchemas()}} inside
JdbcStoragePlugin.java and could set an appropriate fetchsize before iterating the result
set.
I've tested this locally and improved latency a lot, but am note sure how this affects other
non-oracle JDBC drivers.

The other (potentially long) query is the table enumeration.
>From what I've seen is Drill not calling the JDBC driver directly, but goes through apache.calcite
calling {{getTableNames()}} which under the hood calls {{java.sql.DatabaseMetaData.getTables()}}
and also contributes to slow metadata retrieval due to small default fetch size.


> Make use of FetchSize in JDBC storage plugin
> --------------------------------------------
>
>                 Key: DRILL-5216
>                 URL: https://issues.apache.org/jira/browse/DRILL-5216
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JDBC
>    Affects Versions: 1.9.0
>         Environment: drill-embedded on ubuntu client - connected to a remote Oracle
>            Reporter: Thomas Bünger
>            Priority: Minor
>
> For remotely located database servers it usually is preferred to increase the fetch size
for selected or all queries to limit number of roundtrips.
> This could either be controlled via plugin config or on a query basis via comments.
> One has also to take metadata retrieval into account, as in larger scenarios - as in
mine - the Oracle cluster hosts thousands of schemas and the small fetchsize results in hundres
of individual roundtrips.
> In the end every Drill query against this storage takes at least a minute justs for querying
the metadata.
> Analysis so far for metadata-retrieving SQL calls:
> So far, Drill is using the JDBC metadata API {{java.sql.DatabaseMetaData.getSchemas()}}
inside JdbcStoragePlugin.java and could set an appropriate fetchsize before iterating the
result set.
> I've tested this locally and improved latency a lot, but am note sure how this affects
other non-oracle JDBC drivers.
> The other (potentially long) query is the table enumeration.
> From what I've seen is Drill not calling the JDBC driver directly, but goes through apache.calcite
calling {{getTableNames()}} which under the hood calls {{java.sql.DatabaseMetaData.getTables()}}
and also contributes to slow metadata retrieval due to small default fetch size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message