hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAWQ-1445) PXF JDBC Security, Extra Props, and Max Queries
Date Thu, 27 Apr 2017 14:01:04 GMT
Jon Roberts created HAWQ-1445:

             Summary: PXF JDBC Security, Extra Props, and Max Queries
                 Key: HAWQ-1445
                 URL: https://issues.apache.org/jira/browse/HAWQ-1445
             Project: Apache HAWQ
          Issue Type: Improvement
          Components: PXF
            Reporter: Jon Roberts
            Assignee: Ed Espino

1) Security
The example has the username and password in the connection string.  

LOCATION ('pxf://localhost:51200/demodb.myclass'

This creates  security issue because anyone that can connect to the database will be able
to see the username and password of the JDBC connection.

I suggest changing the URL to a connection profile that points to a file outside of the database.
 For Greenplum database and S3, the LOCATION syntax includes "config=/path/to/config_file".
 The config_file contains the S3 credentials.  This seems like a good pattern to follow here

2) Extra Properties
Some JDBC drivers will need many additional properties beyond the URL and this requires setting
it with a put to a Properties variable.  An example of this is Oracle's defaultRowPrefetch
property that needs to be updated from the default of 10 which is designed for OLTP to something
larger like 2000 which is more ideal for data extracts.  

Additionally, you will need the ability to set the isolation level which is done with setTransactionIsolation
on the Connection.  I don't believe you can set this on the connection URL either.  Many SQL
Server and DB2 database still don't use snapshot isolation and use dirty reads instead to
prevent blocking locks.  The configuration file I suggested above will need an "extra properties"
variable that is a delimited list of key/value pairs so you can add multiple extra properties.

3) Max Queries
The external table definition doesn't limit how many concurrent queries can be executed on
the remote server.  It would be pretty simple to create a single external table using PXF
JDBC that would issue thousands of concurrent queries to a single source database when doing
a single SELECT in HAWQ.

Initially, we should add a max_queries variable to the configuration file that I'm suggesting,
that will reject queries from proceeding when a greater number of PXF instances are being
requested than the max_queries variable.  Longer term, we should implement a queueing system
so we can support external tables that partitions data from the source at a very small grain
but without killing the source database.

This message was sent by Atlassian JIRA

View raw message