drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Magnus Pierre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill
Date Tue, 18 Aug 2015 18:49:46 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701776#comment-14701776
] 

Magnus Pierre commented on DRILL-3180:
--------------------------------------

Some findings:
The plugin is much improved, but I found some issues that might be looked at:

Metadata for tables only available if URL does not contain target database and only contain
the path to the server. (MYSQL) - This is not how it need to work since it excludes many databases
such as Netezza that demands to have a target database as part of the URL. (at least when
I tried them)

I personally do not think it is a good idea to have the connection to the database as part
of the constructor method for the plugin. It is not possible to create a plugin statement
in drill for a db that is not online, even though the information is valid. One alternative
is to actually check enabled or not in the json before deciding to connect to db.

The constructor need to throw the correct error message in case of valid json but invalid
connection otherwise it will be hard to understand where the real problem lies.

SHOW TABLES does not work for schemas returned from JDBC plugin.

Join pushdown works for simple constructs such as:
select * from 
   mp.employees.`employees` e
INNER JOIN 
   mp.employees.`salaries` s
ON e.`EMP_NO` = s.`EMP_NO`
WHERE  s.`to_date` > CURRENT_DATE 

But is not happening when writing it as:
select * from 
   mp.employees.`employees` e
INNER JOIN 
   mp.employees.`salaries` s
ON e.`EMP_NO` = s.`EMP_NO`
AND  s.`to_date` > CURRENT_DATE 

Which is quite common-place. 

A more complex query:
select * from 
   mp.employees.`employees` e
INNER JOIN 
   mp.employees.`salaries` s
ON e.`EMP_NO` = s.`EMP_NO`
INNER JOIN 
   mp.employees.`dept_emp` ed
ON e.`EMP_NO` = ed.`EMP_NO` 
WHERE s.`to_date` > CURRENT_DATE and ed.`to_date` > CURRENT_DATE

Fail with:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException:
Already had POJO for id (java.lang.Integer) [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8]
Fragment 1:0 [Error Id: b34092c2-9225-4e37-9955-3a28b6215d97 on administorsmbp2.lan:31010]

I have not investigated it further.




> Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from
Apache Drill
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3180
>                 URL: https://issues.apache.org/jira/browse/DRILL-3180
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>    Affects Versions: 1.0.0
>            Reporter: Magnus Pierre
>            Assignee: Jacques Nadeau
>              Labels: Drill, JDBC, plugin
>             Fix For: 1.2.0
>
>         Attachments: patch.diff, pom.xml, storage-mpjdbc.zip
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I have developed the base code for a JDBC storage-plugin for Apache Drill. The code is
primitive but consitutes a good starting point for further coding. Today it provides primitive
support for SELECT against RDBMS with JDBC. 
> The goal is to provide complete SELECT support against RDBMS with push down capabilities.
> Currently the code is using standard JDBC classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message