flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: FLINK-3750 (JDBCInputFormat)
Date Thu, 14 Apr 2016 15:28:04 GMT
On 14.04.2016 17:22, Fabian Hueske wrote:
> Hi Flavio,
> that are good questions.
> 1) Replacing null values by default values and simply forwarding records is
> very dangerous, in my opinion.
> I see two alternatives: A) we use a data type that tolerates null values.
> This could be a POJO that the user has to provide or Row. The drawback of
> Row is that it is untyped and not easy to handle. B) We use Tuple and add
> an additional field that holds an Integer which serves as a bitset to mark
> null fields. This would be a pretty low level API though. I am leaning
> towards the user-provided POJO option.
i would also lean towards the POJO option.
> 2) The JDBCInputFormat is located in a dedicated Maven module. I think we
> can add a dependency to that module. However, it should also be possible to
> reuse the same connection of an InputFormat across InputSplits, i.e., calls
> of the open() method. Wouldn't that be sufficient?
this is the right approach imo.
> Best, Fabian
> 2016-04-14 16:59 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>> Hi guys,
>> I'm integrating the comments of Chesnay to my PR but there's a couple of
>> thing that I'd like to discuss with the core developers.
>>     1. about the JDBC type mapping (addValue() method at [1]: At the moment
>>     if I find a null value for a  Double, the getDouble of jdbc return 0.0.
>> Is
>>     it really the correct behaviour? Wouldn't be better to use a POJO or the
>>     Row of datatable that can handle void? Moreover, the mapping between SQL
>>     type and Java types varies much from the single JDBC implementation.
>>     Wouldn't be better to rely on the Java type coming from using
>>     resultSet.getObject() to get such a mapping rather than using the
>>     ResultSetMetadata types?
>>     2. I'd like to handle connections very efficiently because we have a use
>>     case with billions of records and thus millions of splits and establish
>> a
>>     new connection each time could be expensive. Would it be a problem to
>> add
>>     apache pool dependency to the jdbc batch connector in order to reuase
>> the
>>     created connections?
>> [1]
>> https://github.com/fpompermaier/flink/blob/FLINK-3750/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java

View raw message