flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: FLINK-3750 (JDBCInputFormat)
Date Thu, 14 Apr 2016 15:22:12 GMT
Hi Flavio,

that are good questions.

1) Replacing null values by default values and simply forwarding records is
very dangerous, in my opinion.
I see two alternatives: A) we use a data type that tolerates null values.
This could be a POJO that the user has to provide or Row. The drawback of
Row is that it is untyped and not easy to handle. B) We use Tuple and add
an additional field that holds an Integer which serves as a bitset to mark
null fields. This would be a pretty low level API though. I am leaning
towards the user-provided POJO option.

2) The JDBCInputFormat is located in a dedicated Maven module. I think we
can add a dependency to that module. However, it should also be possible to
reuse the same connection of an InputFormat across InputSplits, i.e., calls
of the open() method. Wouldn't that be sufficient?

Best, Fabian

2016-04-14 16:59 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:

> Hi guys,
>
> I'm integrating the comments of Chesnay to my PR but there's a couple of
> thing that I'd like to discuss with the core developers.
>
>
>    1. about the JDBC type mapping (addValue() method at [1]: At the moment
>    if I find a null value for a  Double, the getDouble of jdbc return 0.0.
> Is
>    it really the correct behaviour? Wouldn't be better to use a POJO or the
>    Row of datatable that can handle void? Moreover, the mapping between SQL
>    type and Java types varies much from the single JDBC implementation.
>    Wouldn't be better to rely on the Java type coming from using
>    resultSet.getObject() to get such a mapping rather than using the
>    ResultSetMetadata types?
>    2. I'd like to handle connections very efficiently because we have a use
>    case with billions of records and thus millions of splits and establish
> a
>    new connection each time could be expensive. Would it be a problem to
> add
>    apache pool dependency to the jdbc batch connector in order to reuase
> the
>    created connections?
>
>
> [1]
>
> https://github.com/fpompermaier/flink/blob/FLINK-3750/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message