flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bortoli <s.bort...@gmail.com>
Subject Re: Oracle 11g number serialization: classcast problem
Date Wed, 23 Mar 2016 09:50:58 GMT
Thanks for the clarification.

case java.sql.Types.DECIMAL:
                    reuse.setField(resultSet.getBigDecimal(pos +
1).doubleValue(), pos);
                    break;

this causes both a nullpointer on null values as well as a double class
cast exception when serializing the tuple.

For the moment, because we have mostly a 'reading problem', we modified the
inputformat to get strings and we output them as CSV.

case java.sql.Types.NUMERIC:
                    if(resultSet.getBigDecimal(pos + 1)==null)
reuse.setField("", pos);
                    else reuse.setField(resultSet.getBigDecimal(pos +
1).toPlainString(), pos);
                    break;

Another problem is that the reading is sequential and does not allow for
splits. When we get a working version that is satisfying, we'll share the
contribution. Our idea is to enable the execution of Sqoop scripts using
Flink. We are testing it on a Oracle table of 11 billion records, but we
did not get through a complete run. We are just at first prototype level,
so there is surely some work to do. :-)

saluti,
Stefano



2016-03-23 10:38 GMT+01:00 Chesnay Schepler <chesnay@apache.org>:

> On 23.03.2016 10:04, Stefano Bortoli wrote:
>
> I had a look at the JDBC input format, and it does indeed interpret
> BIGDECIMAL and NUMERIC values as double.
>
> This sounds more like a bug actually. Feel free to open a JIRA for this.
>
> The status of the JDBCInputFormat is not adequate for real world use case,
> as for example does not deal with NULL values.
>
> This was already reported in FLINK-3471. To clarify, for NULL fields the
> format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME,
> TIMESTAMP, or SQLXML. Other types should default to 0, empty string or
> false; which actually isn't intended behavior, but caused by JDBC itself.
>
> Defaulting to some value seems the only way to deal with this issue, since
> we can't store null in a Tuple.
>
> I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default
> to, as such i didn't change them yet. I also just dislike the fact that a
> straight copy from A to B will not produce the same table.
>
> However, with little effort we fixed few stuff and now we are getting to
> something usable. We are actually trying to do something a-la sqoop,
> therefore given a boundary query, we create the splits, and then assign it
> to the input format to read the database with configurable parallelism. We
> are still working on it. If we get to something stable and working, we'll
> gladly share it.
>
> saluti,
> Stefano
>
> 2016-03-22 15:46 GMT+01:00 Chesnay Schepler <chesnay@apache.org>:
>
>> The JDBC formats don't make any assumption as to what DB backend is used.
>>
>> A JDBC float in general is returned as a double, since that was the
>> recommended mapping i found when i wrote the formats.
>>
>> Is the INT returned as a double as well?
>>
>> Note: The (runtime) output type is in no way connected to the TypeInfo
>> you pass when constructing the format.
>>
>>
>> On 21.03.2016 14:16, Stefano Bortoli wrote:
>>
>>> Hi squirrels,
>>>
>>> I working on a flink job connecting to a Oracle DB. I started from the
>>> JDBC example for Derby, and used the TupleTypeInfo to configure the fields
>>> of the tuple as it is read.
>>>
>>> The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently,
>>> using Oracle, all the numbers are read as Double, causing a ClassCast
>>> exception. Of course I can fix it by changing the TupleTypeInfo, but I
>>> wonder whether there is some assumption for Oracle and Numbers.
>>>
>>> Thanks a lot for your support!
>>>
>>> saluti,
>>> Stefano
>>>
>>
>>
>
>

Mime
View raw message