Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: <56F26401.8020401@apache.org>
References: 
 <CAN0XJzPq+-ARa+X1Q_QKLTiHfBKXUkqZQVR0tA=UYE69=7DbDw@mail.gmail.com>
 <56F15AB4.5030905@apache.org>
 <CAN0XJzOm+E499G1E+5ia0ysirECMirYtUwHasmG_bW+9cV-6tg@mail.gmail.com>
 <56F26401.8020401@apache.org>
From: Stefano Bortoli <s.bortoli@gmail.com>
Date: Wed, 23 Mar 2016 10:50:58 +0100
Message-ID: 
 <CAN0XJzOjet8bFx5+-oXzamOu3Sq6_c4ghovZDW6g=2BfBG2LNg@mail.gmail.com>
Subject: Re: Oracle 11g number serialization: classcast problem
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a1130d2187ab408052eb445a4

--001a1130d2187ab408052eb445a4
Content-Type: text/plain; charset=UTF-8

Thanks for the clarification.

case java.sql.Types.DECIMAL:
                    reuse.setField(resultSet.getBigDecimal(pos +
1).doubleValue(), pos);
                    break;

this causes both a nullpointer on null values as well as a double class
cast exception when serializing the tuple.

For the moment, because we have mostly a 'reading problem', we modified the
inputformat to get strings and we output them as CSV.

case java.sql.Types.NUMERIC:
                    if(resultSet.getBigDecimal(pos + 1)==null)
reuse.setField("", pos);
                    else reuse.setField(resultSet.getBigDecimal(pos +
1).toPlainString(), pos);
                    break;

Another problem is that the reading is sequential and does not allow for
splits. When we get a working version that is satisfying, we'll share the
contribution. Our idea is to enable the execution of Sqoop scripts using
Flink. We are testing it on a Oracle table of 11 billion records, but we
did not get through a complete run. We are just at first prototype level,
so there is surely some work to do. :-)

saluti,
Stefano


2016-03-23 10:38 GMT+01:00 Chesnay Schepler <chesnay@apache.org>:

> On 23.03.2016 10:04, Stefano Bortoli wrote:
>
> I had a look at the JDBC input format, and it does indeed interpret
> BIGDECIMAL and NUMERIC values as double.
>
> This sounds more like a bug actually. Feel free to open a JIRA for this.
>
> The status of the JDBCInputFormat is not adequate for real world use case,
> as for example does not deal with NULL values.
>
> This was already reported in FLINK-3471. To clarify, for NULL fields the
> format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME,
> TIMESTAMP, or SQLXML. Other types should default to 0, empty string or
> false; which actually isn't intended behavior, but caused by JDBC itself.
>
> Defaulting to some value seems the only way to deal with this issue, since
> we can't store null in a Tuple.
>
> I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default
> to, as such i didn't change them yet. I also just dislike the fact that a
> straight copy from A to B will not produce the same table.
>
> However, with little effort we fixed few stuff and now we are getting to
> something usable. We are actually trying to do something a-la sqoop,
> therefore given a boundary query, we create the splits, and then assign it
> to the input format to read the database with configurable parallelism. We
> are still working on it. If we get to something stable and working, we'll
> gladly share it.
>
> saluti,
> Stefano
>
> 2016-03-22 15:46 GMT+01:00 Chesnay Schepler <chesnay@apache.org>:
>
>> The JDBC formats don't make any assumption as to what DB backend is used.
>>
>> A JDBC float in general is returned as a double, since that was the
>> recommended mapping i found when i wrote the formats.
>>
>> Is the INT returned as a double as well?
>>
>> Note: The (runtime) output type is in no way connected to the TypeInfo
>> you pass when constructing the format.
>>
>>
>> On 21.03.2016 14:16, Stefano Bortoli wrote:
>>
>>> Hi squirrels,
>>>
>>> I working on a flink job connecting to a Oracle DB. I started from the
>>> JDBC example for Derby, and used the TupleTypeInfo to configure the fields
>>> of the tuple as it is read.
>>>
>>> The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently,
>>> using Oracle, all the numbers are read as Double, causing a ClassCast
>>> exception. Of course I can fix it by changing the TupleTypeInfo, but I
>>> wonder whether there is some assumption for Oracle and Numbers.
>>>
>>> Thanks a lot for your support!
>>>
>>> saluti,
>>> Stefano
>>>
>>
>>
>
>

--001a1130d2187ab408052eb445a4
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div>Thanks for the clarification.<br>=
<br>case java.sql.Types.DECIMAL:<br>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 reuse.setField(res=
ultSet.getBigDecimal(pos + 1).doubleValue(), pos);<br>=C2=A0=C2=A0=C2=A0 =
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=
 break;<br><br></div>this causes both a nullpointer on null values as well =
as a double class cast exception when serializing the tuple.<br><br></div>F=
or the moment, because we have mostly a &#39;reading problem&#39;, we modif=
ied the inputformat to get strings and we output them as CSV. <br><br>case =
java.sql.Types.NUMERIC:<br>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if(resultSet.getBigDecimal(=
pos + 1)=3D=3Dnull) reuse.setField(&quot;&quot;, pos);<br>=C2=A0=C2=A0=C2=
=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=
=C2=A0 else reuse.setField(resultSet.getBigDecimal(pos + 1).toPlainString()=
, pos);<br>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=
=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 break;<br><br></div>Another problem is that=
 the reading is sequential and does not allow for splits. When we get a wor=
king version that is satisfying, we&#39;ll share the contribution. Our idea=
 is to enable the execution of Sqoop scripts using Flink. We are testing it=
 on a Oracle table of 11 billion records, but we did not get through a comp=
lete run. We are just at first prototype level, so there is surely some wor=
k to do. :-)<br><br></div>saluti,<br></div>Stefano<br><div><div><div><div><=
br><br></div></div></div></div></div><div class=3D"gmail_extra"><br><div cl=
ass=3D"gmail_quote">2016-03-23 10:38 GMT+01:00 Chesnay Schepler <span dir=
=3D"ltr">&lt;<a href=3D"mailto:chesnay@apache.org" target=3D"_blank">chesna=
y@apache.org</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000"><span class=3D"">
    <div>On 23.03.2016 10:04, Stefano Bortoli
      wrote:<br>
    </div>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">
        <div>
          <div>I had a look at the JDBC input format, and it does indeed
            interpret BIGDECIMAL and NUMERIC values as double. </div>
        </div>
      </div>
    </blockquote></span>
    This sounds more like a bug actually. Feel free to open a JIRA for
    this.<span class=3D""><br>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">
        <div>
          <div>The status of the JDBCInputFormat is not adequate for
            real world use case, as for example does not deal with NULL
            values.<br>
          </div>
        </div>
      </div>
    </blockquote></span>
    This was already reported in FLINK-3471. To clarify, for NULL fields
    the format fails only if the type is either DECIMAL, NUMERIC, DATE,
    TIME, TIMESTAMP, or SQLXML. Other types should default to 0, empty
    string or false; which actually isn&#39;t intended behavior, but caused
    by JDBC itself.<br>
    <br>
    Defaulting to some value seems the only way to deal with this issue,
    since we can&#39;t store null in a Tuple.<br>
    <br>
    I wasn&#39;t sure what value DATE, TIME, TIMESTAMP and SQLXML should
    default to, as such i didn&#39;t change them yet. I also just dislike
    the fact that a straight copy from A to B will not produce the same
    table.<span class=3D""><br>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">
        <div>
          <div>However, with little effort we fixed few stuff and now we
            are getting to something usable. We are actually trying to
            do something a-la sqoop, therefore given a boundary query,
            we create the splits, and then assign it to the input format
            to read the database with configurable parallelism. We are
            still working on it. If we get to something stable and
            working, we&#39;ll gladly share it. <br>
            <br>
          </div>
          saluti,<br>
        </div>
        Stefano<br>
      </div>
      <div class=3D"gmail_extra"><br>
        <div class=3D"gmail_quote">2016-03-22 15:46 GMT+01:00 Chesnay
          Schepler <span dir=3D"ltr">&lt;<a href=3D"mailto:chesnay@apache.o=
rg" target=3D"_blank">chesnay@apache.org</a>&gt;</span>:<br>
          <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex">The JDBC
            formats don&#39;t make any assumption as to what DB backend is
            used.<br>
            <br>
            A JDBC float in general is returned as a double, since that
            was the recommended mapping i found when i wrote the
            formats.<br>
            <br>
            Is the INT returned as a double as well?<br>
            <br>
            Note: The (runtime) output type is in no way connected to
            the TypeInfo you pass when constructing the format.
            <div>
              <div><br>
                <br>
                On <a href=3D"tel:21.03.2016%2014" value=3D"+12103201614" t=
arget=3D"_blank">21.03.2016 14</a>:16,
                Stefano Bortoli wrote:<br>
                <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex">
                  Hi squirrels,<br>
                  <br>
                  I working on a flink job connecting to a Oracle DB. I
                  started from the JDBC example for Derby, and used the
                  TupleTypeInfo to configure the fields of the tuple as
                  it is read.<br>
                  <br>
                  The record of the example has 2 INT, 1 FLOAT and 2
                  VARCHAR. Apparently, using Oracle, all the numbers are
                  read as Double, causing a ClassCast exception. Of
                  course I can fix it by changing the TupleTypeInfo, but
                  I wonder whether there is some assumption for Oracle
                  and Numbers.<br>
                  <br>
                  Thanks a lot for your support!<br>
                  <br>
                  saluti,<br>
                  Stefano<br>
                </blockquote>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </span></div>

</blockquote></div><br></div>

--001a1130d2187ab408052eb445a4--