Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 269AC180F4 for ; Wed, 23 Mar 2016 09:51:23 +0000 (UTC) Received: (qmail 27839 invoked by uid 500); 23 Mar 2016 09:51:22 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 27732 invoked by uid 500); 23 Mar 2016 09:51:22 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 27722 invoked by uid 99); 23 Mar 2016 09:51:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Mar 2016 09:51:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 70E661A4746 for ; Wed, 23 Mar 2016 09:51:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id pFLXkdHwUGus for ; Wed, 23 Mar 2016 09:51:20 +0000 (UTC) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 35A915FAE1 for ; Wed, 23 Mar 2016 09:51:19 +0000 (UTC) Received: by mail-wm0-f51.google.com with SMTP id r129so127994117wmr.1 for ; Wed, 23 Mar 2016 02:51:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=oYeADvBiipDATCRQDLBVkavPw2wKgNDN7tq2P5LEjLU=; b=MkkTJEyZd4sN56FCwLjv5fLHIOymSz2GGwZ5sQIlp3/K2JwWAvZ/vWq79xbLMV5FcB Fi1zEMiTmHb+7ycnEcR+rKeC1grBeuGeZMvZtXcPGzfkQg1pyRobuUtjnsREMhcgV8HA f9jFWWplHmKFU7oHc/iS5vq4a3v+gt1KbKlR8MZPsNeyeNqm+/dsdzEOW5tWBkG/1TyJ lm9divAp5Tg/ZoR4s4zw/o5A9gNMoQus0fAzEyPlW8Ik8z1OZ0/zr44KOiEGE7BpFl/I G8Qt6YwW4BiYFx3X0l29bnJ9HP29GUphtvpdsDbjyB9i5/10HKJGwRLsDz64iIHuZgrP MY5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=oYeADvBiipDATCRQDLBVkavPw2wKgNDN7tq2P5LEjLU=; b=FeWCBeRQJxeLU30lEgdexV24iDV3WZwLB0FU+rw8SGuKeTQ6e2wH+GQ7YJ61T+YG7s Bt98zuvgmmFOHhdKGkEWQ5EJ8RUcRprefaZ44Ew2oxqJyq9lCngZquybmitEBZShRXEg Ufo/UphWmXxWcd/xBho14nOdNI5xdWIbr9N0ReYz9iAAtg3obNTDWMOeb3RyxvJr3ZJx YumCCGxbww9B+Jf27jdOuhNVkKaoC4WnT0V6qojB4/T58czRSqvrb20aBgCinjV5Qkt/ mEP2oqWiTELnoGR0J6F8NfFCzAAxsoS/qmVoY8VmpSD3IKp0dk/R3DA0TIgejLYM2Qo+ 0F6w== X-Gm-Message-State: AD7BkJKAGAZQCF4wAMz6Ky7t9Kp1AVzACjM8oU1h/3q3iOhumXwJFO1QWFEZiq6sECRqkpwlm42sqIGSnUIMWA== X-Received: by 10.194.116.9 with SMTP id js9mr2625930wjb.112.1458726678212; Wed, 23 Mar 2016 02:51:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.211.203 with HTTP; Wed, 23 Mar 2016 02:50:58 -0700 (PDT) In-Reply-To: <56F26401.8020401@apache.org> References: <56F15AB4.5030905@apache.org> <56F26401.8020401@apache.org> From: Stefano Bortoli Date: Wed, 23 Mar 2016 10:50:58 +0100 Message-ID: Subject: Re: Oracle 11g number serialization: classcast problem To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a1130d2187ab408052eb445a4 --001a1130d2187ab408052eb445a4 Content-Type: text/plain; charset=UTF-8 Thanks for the clarification. case java.sql.Types.DECIMAL: reuse.setField(resultSet.getBigDecimal(pos + 1).doubleValue(), pos); break; this causes both a nullpointer on null values as well as a double class cast exception when serializing the tuple. For the moment, because we have mostly a 'reading problem', we modified the inputformat to get strings and we output them as CSV. case java.sql.Types.NUMERIC: if(resultSet.getBigDecimal(pos + 1)==null) reuse.setField("", pos); else reuse.setField(resultSet.getBigDecimal(pos + 1).toPlainString(), pos); break; Another problem is that the reading is sequential and does not allow for splits. When we get a working version that is satisfying, we'll share the contribution. Our idea is to enable the execution of Sqoop scripts using Flink. We are testing it on a Oracle table of 11 billion records, but we did not get through a complete run. We are just at first prototype level, so there is surely some work to do. :-) saluti, Stefano 2016-03-23 10:38 GMT+01:00 Chesnay Schepler : > On 23.03.2016 10:04, Stefano Bortoli wrote: > > I had a look at the JDBC input format, and it does indeed interpret > BIGDECIMAL and NUMERIC values as double. > > This sounds more like a bug actually. Feel free to open a JIRA for this. > > The status of the JDBCInputFormat is not adequate for real world use case, > as for example does not deal with NULL values. > > This was already reported in FLINK-3471. To clarify, for NULL fields the > format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME, > TIMESTAMP, or SQLXML. Other types should default to 0, empty string or > false; which actually isn't intended behavior, but caused by JDBC itself. > > Defaulting to some value seems the only way to deal with this issue, since > we can't store null in a Tuple. > > I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default > to, as such i didn't change them yet. I also just dislike the fact that a > straight copy from A to B will not produce the same table. > > However, with little effort we fixed few stuff and now we are getting to > something usable. We are actually trying to do something a-la sqoop, > therefore given a boundary query, we create the splits, and then assign it > to the input format to read the database with configurable parallelism. We > are still working on it. If we get to something stable and working, we'll > gladly share it. > > saluti, > Stefano > > 2016-03-22 15:46 GMT+01:00 Chesnay Schepler : > >> The JDBC formats don't make any assumption as to what DB backend is used. >> >> A JDBC float in general is returned as a double, since that was the >> recommended mapping i found when i wrote the formats. >> >> Is the INT returned as a double as well? >> >> Note: The (runtime) output type is in no way connected to the TypeInfo >> you pass when constructing the format. >> >> >> On 21.03.2016 14:16, Stefano Bortoli wrote: >> >>> Hi squirrels, >>> >>> I working on a flink job connecting to a Oracle DB. I started from the >>> JDBC example for Derby, and used the TupleTypeInfo to configure the fields >>> of the tuple as it is read. >>> >>> The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently, >>> using Oracle, all the numbers are read as Double, causing a ClassCast >>> exception. Of course I can fix it by changing the TupleTypeInfo, but I >>> wonder whether there is some assumption for Oracle and Numbers. >>> >>> Thanks a lot for your support! >>> >>> saluti, >>> Stefano >>> >> >> > > --001a1130d2187ab408052eb445a4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for the clarification.
=
case java.sql.Types.DECIMAL:
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 = =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 reuse.setField(res= ultSet.getBigDecimal(pos + 1).doubleValue(), pos);
=C2=A0=C2=A0=C2=A0 = =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0= break;

this causes both a nullpointer on null values as well = as a double class cast exception when serializing the tuple.

F= or the moment, because we have mostly a 'reading problem', we modif= ied the inputformat to get strings and we output them as CSV.

case = java.sql.Types.NUMERIC:
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if(resultSet.getBigDecimal(= pos + 1)=3D=3Dnull) reuse.setField("", pos);
=C2=A0=C2=A0=C2= =A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 else reuse.setField(resultSet.getBigDecimal(pos + 1).toPlainString()= , pos);
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0= =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 break;

Another problem is that= the reading is sequential and does not allow for splits. When we get a wor= king version that is satisfying, we'll share the contribution. Our idea= is to enable the execution of Sqoop scripts using Flink. We are testing it= on a Oracle table of 11 billion records, but we did not get through a comp= lete run. We are just at first prototype level, so there is surely some wor= k to do. :-)

saluti,
Stefano
<= br>

2016-03-23 10:38 GMT+01:00 Chesnay Schepler <chesna= y@apache.org>:
=20 =20 =20
On 23.03.2016 10:04, Stefano Bortoli wrote:
I had a look at the JDBC input format, and it does indeed interpret BIGDECIMAL and NUMERIC values as double.
This sounds more like a bug actually. Feel free to open a JIRA for this.
The status of the JDBCInputFormat is not adequate for real world use case, as for example does not deal with NULL values.
This was already reported in FLINK-3471. To clarify, for NULL fields the format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME, TIMESTAMP, or SQLXML. Other types should default to 0, empty string or false; which actually isn't intended behavior, but caused by JDBC itself.

Defaulting to some value seems the only way to deal with this issue, since we can't store null in a Tuple.

I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default to, as such i didn't change them yet. I also just dislike the fact that a straight copy from A to B will not produce the same table.
However, with little effort we fixed few stuff and now we are getting to something usable. We are actually trying to do something a-la sqoop, therefore given a boundary query, we create the splits, and then assign it to the input format to read the database with configurable parallelism. We are still working on it. If we get to something stable and working, we'll gladly share it.

saluti,
Stefano

2016-03-22 15:46 GMT+01:00 Chesnay Schepler <chesnay@apache.org>:
The JDBC formats don't make any assumption as to what DB backend is used.

A JDBC float in general is returned as a double, since that was the recommended mapping i found when i wrote the formats.

Is the INT returned as a double as well?

Note: The (runtime) output type is in no way connected to the TypeInfo you pass when constructing the format.


On 21.03.2016 14:16, Stefano Bortoli wrote:
Hi squirrels,

I working on a flink job connecting to a Oracle DB. I started from the JDBC example for Derby, and used the TupleTypeInfo to configure the fields of the tuple as it is read.

The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently, using Oracle, all the numbers are read as Double, causing a ClassCast exception. Of course I can fix it by changing the TupleTypeInfo, but I wonder whether there is some assumption for Oracle and Numbers.

Thanks a lot for your support!

saluti,
Stefano




--001a1130d2187ab408052eb445a4--