hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Pearce <Michael.Pea...@ig.com>
Subject Re: PXF JDBC plugin
Date Tue, 04 Apr 2017 03:15:17 GMT
Hi Jon

I think on issue 1 and 2 are valid improvements that can be made, add some Jira's for these.
Looking at issue 1 it seems quite trivial for someone to contribute the solution for.

On the question front:

1) from my understanding and using this plugin so far this is what the partion by interval
allows you to control the number of partitions that would be generated, e.g. This is why the
sample in document has 1 year so it only creates 2 fragments.  You can also partition by an
enum field here the number of partitions is the number of enums.

2) I haven't checked specifically, saying that I haven't run into an issue when trialling
this myself. Obviously this is contributed by Devin he will be the definitive source here.
Though I guess can always specially test these scenarios.

Sent using OWA for iPhone
________________________________________
From: Jon Roberts <jroberts@pivotal.io>
Sent: Monday, April 3, 2017 11:57:59 PM
To: dev@hawq.incubator.apache.org
Subject: Re: PXF JDBC plugin

https://github.com/apache/incubator-hawq/tree/master/pxf/pxf-jdbc

Issue 1: Security
The example has the username and password in the connection string.

LOCATION ('pxf://localhost:51200/demodb.myclass'
              '?PROFILE=JDBC'
              '&JDBC_DRIVER=com.mysql.jdbc.Driver'
              '&DB_URL=jdbc:mysql://
192.168.200.6:3306/demodb&USER=root&PASS=root'
              )

Any chance we can get this changed to a connection profile that points to a
file outside of the database?  For Greenplum database and S3, the LOCATION
syntax includes "config=/path/to/config_file".  The config_file contains
the S3 credentials.  This seems like a good pattern to follow.

As it is right now, anyone that can connect to the database will be able to
see the username and password of the JDBC connection.

Issue 2: Extra Properties

Some JDBC drivers will need many additional properties beyond the URL and
this requires setting it with a put to a Properties variable.  An example
of this is Oracle's defaultRowPrefetch property that needs to be updated
from the default of 10 which is designed for OLTP to something larger like
2000 which is more ideal for data extracts.

Additionally, you will need the ability to set the isolation level which is
done with setTransactionIsolation on the Connection.  I don't believe you
can set this on the connection URL either.  Many SQL Server and DB2
database still don't use snapshot isolation and use dirty reads instead.
Without a dirty read here, your query will block modifications to the table
with a "blocking lock".

So for the configuration file will need both an extra properties variable
that is delimited so you can multiple and a isolation level indicator.

Questions.
1.  How do you manage the maximum number of pxf instances?  For example, if
you partition like this:
"PARTITION_BY=cdate:date&RANGE=2008-01-01:2010-01-01&INTERVAL=1:day", will
you create 730 pxf instances and thus 730 concurrent queries to the source
database?
2.  Have you tested special characters like null, escape, carriage return,
newline, and your delimiter?  Maybe that is handled automatically by
pxfwritable_import so this isn't an issue.



Jon Roberts
Principal Engineer | jroberts@pivotal.io | 615-426-8661

On Mon, Apr 3, 2017 at 3:18 PM, Michael Pearce <Michael.Pearce@ig.com>
wrote:

> I mean the readme file :) at that location.
> ________________________________________
> From: Michael Pearce
> Sent: Monday, April 3, 2017 9:15:48 PM
> To: dev@hawq.incubator.apache.org
> Subject: Re: PXF JDBC plugin
>
> On master there is the document that was added.
>
> https://github.com/apache/incubator-hawq/tree/master/pxf/pxf-jdbc
>
> Reading this I think answers some of the questions.
>
>
>
>
>
> Sent using OWA for iPhone
> ________________________________________
> From: Vineet Goel <vvineet@apache.org>
> Sent: Monday, April 3, 2017 6:33:07 PM
> To: dev@hawq.incubator.apache.org
> Subject: Re: PXF JDBC plugin
>
> Devin and others,
>
> It would be great if you can expand a bit more on the implementation or
> point to a document?
>
> Thanks!
>
>
>
> On Mon, Apr 3, 2017 at 7:53 AM Jon Roberts <jroberts@pivotal.io> wrote:
>
> > JDBC PXF is pretty exciting.  Are there details on how this works?
> >
> > Does PXF randomly pick a segment to connect to the JDBC source or do you
> > specify a particular node to execute the query?  I'm assuming it isn't a
> > parallel query so you don't flood the JDBC source with x number of
> "select
> > *" queries but I could be wrong.
> >
> > How do you register the JDBC connection string with PXF?
> >
> > How do you manage credentials?
> >
> > How do you register the JDBC jar files?
> >
> >
> > Jon Roberts
> >
> > On Sat, Apr 1, 2017 at 5:05 PM, Ed Espino <espino@apache.org> wrote:
> >
> > > Thank you Michael, this is a well-deserved report highlight. I have
> added
> > > the following to the report:
> > >
> > >   3. Community contribution highlight(s):
> > >
> > >      * Leveraging the extensible PXF design, a JDBC PXF plugin
> > >        was contributed by Devin Jia (github id: jiadexin). This
> > >        contribution came from the community and not from the
> > >        company which originally donated HAWQ to the ASF.
> > >
> > > Warm regrds,
> > > -=e
> > >
> > > On Sat, Apr 1, 2017 at 1:14 PM, Michael André Pearce <
> > > michael.andre.pearce@me.com> wrote:
> > >
> > > >
> > > > I would add note about the new JDBC PXF plugin being contributed by
> > Devin
> > > > Jia. (Kudos to him)
> > > >
> > > > It's a good show of the community contributing and growing outside of
> > > just
> > > > Pivotal staff (I think for incubator review this is important) and
> also
> > > the
> > > > benefit of the extensible PXF design the project has done.
> >
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>
The information contained in this email is strictly confidential and for the use of the addressee
only, unless otherwise indicated. If you are not the intended recipient, please do not read,
copy, use or disclose to others this message or any attachment. Please also notify the sender
by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and
any copies of it. Opinions, conclusion (etc) that do not relate to the official business of
this company shall be understood as neither given nor endorsed by it. IG is a trading name
of IG Markets Limited (a company registered in England and Wales, company number 04008957)
and IG Index Limited (a company registered in England and Wales, company number 01190902).
Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets
Limited (register number 195355) and IG Index Limited (register number 114059) are authorised
and regulated by the Financial Conduct Authority.

Mime
View raw message