apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Changes to JdbcOutputOperator
Date Thu, 10 Mar 2016 12:04:43 GMT
Hi All,

Apart from extending the JDBC Output operator (for update, merge and
delete), I am planning the following change to the "*insert*" functionality:


   - It was pointed out that - if the incoming POJO has fields which are to
   be inserted into a database table, the user should not have to specify any
   kind of mapping between POJO columns and database table column names. This
   is possible if the POJO field names are same as the db column names.


   - Here is what I am planning to do to address this:
      - Have a flag which indicates "*No mapping required. Assume same POJO
      field names as the DB column names.*"
      - If this flag is true, extract DB column names (in addition to the
      types, which is done currently) and create getters from POJO based on the
      DB column types.
      - Use these getters to populate the parametrized "*insert*" query.


Note - Once this is done, the user would not have to specify any field
descriptions. Just the DB table name would be sufficient. This would only
be possible if the DB column names are the same as the field names in the
POJO.
Without this change, "*insert*" will be considered as just another DDL
query like "update" / "delete" / "merge". We accept a parametrized query
and a mapping from the user for POJO field names to DB column names + DB
column types.

Thanks.
Bhupesh


On Wed, Dec 30, 2015 at 12:56 PM, Bhupesh Chawda <bhupesh@datatorrent.com>
wrote:

> Also, since we are not creating separate operators for insert and
> update/merge, it seems we don't need an abstract class. We can directly
> modify JdbcPOJOOutputOperator.
>
> -Bhupesh
>
> On Wed, Dec 30, 2015 at 11:17 AM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
>
>> Also, a note regarding accepting fieldInfo objects:
>>
>> FieldInfo has fields for java type of the column, but not the SQL data
>> type which is needed to set the parameters in the SQL statement. This still
>> needs to be derived from the database (as in insert).
>> In more complex scenarios, as mentioned earlier, this may not always be
>> possible. In this case we have the following options:
>>
>>    1. Accept the SQL data types from the user for the parameters ("?")
>>    in a new data structure.
>>    2. Accept the JSON structure as specified earlier.
>>    3. Modify FieldInfo to have an additional optional field which
>>    indicates the SQL data type of the parameter ("?")
>>
>> I am planning to go ahead with  option 3.
>>
>> Comments?
>>
>> -Bhupesh
>>
>>
>>
>> On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <chandni@datatorrent.com>
>> wrote:
>>
>>> Yeah that sounds good.
>>>
>>> Chandni
>>>
>>> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <bhupesh@datatorrent.com
>>> >
>>> wrote:
>>>
>>> > Thanks Chandni for the comments.
>>> >
>>> > Additionally I think, we should do away with constructing the SQL
>>> query in
>>> > the operator. This is because it is only possible in case of simple
>>> insert
>>> > statements. In case a where clause is needed in the insert statement,
>>> we
>>> > cannot construct the SQL easily. If we are accepting a parametrized
>>> query
>>> > from the user for update/merge, why not do the same for insert
>>> statements
>>> > as well? Then the hierarchy would look like:
>>> > - AbstractJdbcPOJOOutputOperator (As suggested)
>>> > -- JdbcPOJOOutputOperator (Takes care of all statements insert, update,
>>> > merge, delete)
>>> >
>>> > We don't need a separate operator for inserts and update/merge.
>>> >
>>> > Thanks.
>>> > -Bhupesh
>>> >
>>> >
>>> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <
>>> chandni@datatorrent.com>
>>> > wrote:
>>> >
>>> > >   1. FieldInfo, is ultimately a custom object and any user who uses
>>> this
>>> > >    operator has to construct an object, populate it and then use it.
>>> We
>>> > are
>>> > >    trying to avoid using any custom object and allow any user to use
>>> the
>>> > >    operator without writing any extra code; just configuration.
>>> > > FieldInfo is a way to provide configuration in a UI friendly way.
>>> > Providing
>>> > > configuration as JSON is not UI friendly.
>>> > >
>>> > >    2. In case of update / merge, we need what SQL data types that the
>>> > >    expression provided by the user would evaluate to. In case of
>>> insert
>>> > we
>>> > > can
>>> > >    go and fetch the data types of the columns directly from DB.
>>> However,
>>> > > the
>>> > >    same is not possible for custom expressions; "avg(salary)" for
>>> > instance.
>>> > > Ok so here is where you can make a change.
>>> > > - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
>>> > > - Abstraction is to fetch the type of column.
>>> > > - Add a concrete JDBCPojoInsertOutput  that derives the types of
>>> columns
>>> > > directly from DB. Please note that FieldInfo can also provide type
of
>>> > > derived column.
>>> > >
>>> > >
>>> > > You mentioned "We are trying to avoid using any custom object and
>>> allow
>>> > any
>>> > > user to use the
>>> > >    operator without writing any extra code".
>>> > > This I think is specific to your use case. You can create an
>>> extension of
>>> > > the above which takes JSON blob and creates FieldInfos from it.
>>> > >
>>> > > Chandni
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com
>>> > >
>>> > > wrote:
>>> > >
>>> > > > Hi Chandni,
>>> > > >
>>> > > > Following are the issues:
>>> > > >
>>> > > >    1. FieldInfo, is ultimately a custom object and any user who
>>> uses
>>> > this
>>> > > >    operator has to construct an object, populate it and then use
>>> it. We
>>> > > are
>>> > > >    trying to avoid using any custom object and allow any user
to
>>> use
>>> > the
>>> > > >    operator without writing any extra code; just configuration.
>>> > > >    2. In case of update / merge, we need what SQL data types that
>>> the
>>> > > >    expression provided by the user would evaluate to. In case
of
>>> insert
>>> > > we
>>> > > > can
>>> > > >    go and fetch the data types of the columns directly from DB.
>>> > However,
>>> > > > the
>>> > > >    same is not possible for custom expressions; "avg(salary)"
for
>>> > > instance.
>>> > > >
>>> > > > Thanks.
>>> > > >
>>> > > > -Bhupesh
>>> > > >
>>> > > >
>>> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
>>> > chandni@datatorrent.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Hi Bhupesh,
>>> > > > >
>>> > > > > JDBCPojoOutputOperator was written for a demo and therefore
it
>>> was
>>> > > marked
>>> > > > > Evolving which is why I had mentioned that you should feel
free
>>> to
>>> > > modify
>>> > > > > it.
>>> > > > >
>>> > > > > I think an insert query can be as complex as any other query.
It
>>> uses
>>> > > > > FieldInfo because in the app builder it is easy for the user
to
>>> > provide
>>> > > > > that instead of JSON String.
>>> > > > >
>>> > > > > Can you please provide specifics about what it is that you
find
>>> > > difficult
>>> > > > > to change/implement for providing update/merge/delete support
in
>>> > > > > JDBCPojoOutputOperator?
>>> > > > >
>>> > > > > Chandni
>>> > > > >
>>> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
>>> > > bhupesh@datatorrent.com
>>> > > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Hi All,
>>> > > > > >
>>> > > > > > It has been pointed out that adding another class for
handling
>>> > > update /
>>> > > > > > merge queries would not be a good option.
>>> > > > > >
>>> > > > > > Here are the current implementation details:
>>> > > > > >
>>> > > > > >    - We have an existing class: JdbcPOJOOutputOperator
which
>>> > accepts
>>> > > a
>>> > > > > list
>>> > > > > >    of FieldInfo objects. Each element of this list indicates
>>> the
>>> > > > details
>>> > > > > >    (column name, pojo field expression and datatype)
of fields
>>> that
>>> > > > need
>>> > > > > > to be
>>> > > > > >    inserted. Using this, the operator formulates the
insert
>>> query
>>> > in
>>> > > > the
>>> > > > > > setup
>>> > > > > >    method and identifies the sql datatypes of these
columns
>>> from
>>> > the
>>> > > > > > database
>>> > > > > >    using the table name.
>>> > > > > >
>>> > > > > >
>>> > > > > >    - Now, coming to the update / merge feature, it is
>>> difficult to
>>> > > > > >    formulate the update / merge query in the operator
logic
>>> due to
>>> > > the
>>> > > > > > complex
>>> > > > > >    structure of these statements. For this reason, we
plan to
>>> take
>>> > a
>>> > > > > >    parametrized SQL query from the user. This may look
like:
>>> > *"update
>>> > > > > table
>>> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*.
Such
>>> statements
>>> > > can
>>> > > > be
>>> > > > > >    accepted from the user in addition to a json string
which
>>> > > indicates
>>> > > > > the
>>> > > > > >    details for the parameters: *column name, the pojo
>>> expression
>>> > and
>>> > > > the
>>> > > > > >    sql data type* of the expression. Note that this
>>> information is
>>> > > > > similar
>>> > > > > >    to the FieldInfo object, but is a string which can
be
>>> configured
>>> > > > > easily
>>> > > > > > by
>>> > > > > >    the user. Also note that the data type which is accepted
is
>>> the
>>> > > SQL
>>> > > > > data
>>> > > > > >    type. Using this info we can populate the parametrized
>>> query and
>>> > > run
>>> > > > > it
>>> > > > > >    based on the incoming POJO.
>>> > > > > >
>>> > > > > > The second approach is able to handle all kinds of queries
>>> (insert
>>> > /
>>> > > > > update
>>> > > > > > / merge / delete). However, since we already have the
>>> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
>>> > functionality
>>> > > > into
>>> > > > > > the same class.
>>> > > > > >
>>> > > > > > Here we have the following options:
>>> > > > > >
>>> > > > > >    1. Change the existing class (JdbcPOJOOutputOperator)
to the
>>> > > second
>>> > > > > >    approach which is more generic and also handles inserts.
>>> > > > > >    2. Add the update/ merge functionality to the existing
class
>>> > > without
>>> > > > > >    changing the existing functionality. This will have
two
>>> > different
>>> > > > ways
>>> > > > > > that
>>> > > > > >    insert queries may be handled in the operator.
>>> > > > > >    3. Add another class which extends from
>>> JdbcPOJOOutputOperator
>>> > and
>>> > > > > have
>>> > > > > >    the update/merge functionality there. (This is not
>>> recommended.)
>>> > > > > >    4. Any other approach.
>>> > > > > >
>>> > > > > > Please suggest.
>>> > > > > >
>>> > > > > > Thanks.
>>> > > > > >
>>> > > > > > -Bhupesh.
>>> > > > > >
>>> > > > > >
>>> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
>>> > > > chandni@datatorrent.com
>>> > > > > >
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > No I don't think we are restricting Malhar to just
abstract
>>> > > classes.
>>> > > > > > > Whenever they are couple of use cases that we see
quite
>>> often, we
>>> > > add
>>> > > > > > > concrete implementations.
>>> > > > > > >
>>> > > > > > > For eg. FileLineInputOperator which is a concrete
>>> implementation
>>> > of
>>> > > > > > > AbstractFileInputOperator. FSSliceReader is an
example as
>>> well.
>>> > > > > > >
>>> > > > > > > In AbstractJdbcOutputOperator case there hasn't
been such
>>> common
>>> > > > > > > insert/update query.
>>> > > > > > >
>>> > > > > > > Also if you look at the example I provided, it
is very
>>> simple to
>>> > > > > provide
>>> > > > > > a
>>> > > > > > > concrete implementation.
>>> > > > > > >
>>> > > > > > > If you would like to change JdbcPOJOOutputOperator
to work
>>> for
>>> > > > > > > "UPDATE/MERGE" then please go ahead.
>>> > > > > > >
>>> > > > > > > Chandni
>>> > > > > > >
>>> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda
<
>>> > > > > bhupesh@datatorrent.com
>>> > > > > > >
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > I see. So, just to understand more, do we
plan to keep
>>> Malhar
>>> > > > > > restricted
>>> > > > > > > to
>>> > > > > > > > the base functionality (as in abstract classes)?
And put
>>> the
>>> > > > > > > configuration
>>> > > > > > > > aspect / concrete implementations in apps
that use these
>>> > > operators?
>>> > > > > > > >
>>> > > > > > > > Thanks.
>>> > > > > > > > Bhupesh
>>> > > > > > > >
>>> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh
<
>>> > > > > > chandni@datatorrent.com>
>>> > > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > Hi,
>>> > > > > > > > >
>>> > > > > > > > > Here is an example of doing Upsert with
JDBC:
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
>>> > > > > > > > >
>>> > > > > > > > > Thanks,
>>> > > > > > > > > Chandni
>>> > > > > > > > >
>>> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni
Singh <
>>> > > > > > > chandni@datatorrent.com
>>> > > > > > > > >
>>> > > > > > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
>>> > > > > > > > > >
>>> > > > > > > > > > Here is one of them:
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
>>> > > > > > > > > >
>>> > > > > > > > > > They work with any kind PreparedStatement
- insert or
>>> > update
>>> > > > > > > > > >
>>> > > > > > > > > > Chandni
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM,
Bhupesh Chawda <
>>> > > > > > > > > bhupesh@datatorrent.com>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > >
>>> > > > > > > > > >> Hi Chandni,
>>> > > > > > > > > >>
>>> > > > > > > > > >> I don't see an update query
being handled in the
>>> operator.
>>> > > > Could
>>> > > > > > you
>>> > > > > > > > > >> please
>>> > > > > > > > > >> point me to the appropriate
class?
>>> > > > > > > > > >> Or did you mean that handling
a update query is just a
>>> > > matter
>>> > > > of
>>> > > > > > > > > extending
>>> > > > > > > > > >> the class and providing a concrete
implementation?
>>> > > > > > > > > >>
>>> > > > > > > > > >> Thanks.
>>> > > > > > > > > >> -Bhupesh
>>> > > > > > > > > >>
>>> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42
PM, Chandni Singh <
>>> > > > > > > > > chandni@datatorrent.com>
>>> > > > > > > > > >> wrote:
>>> > > > > > > > > >>
>>> > > > > > > > > >> > Hi Bhupesh,
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > The current abstract JDBC
Output Operators  in
>>> library
>>> > are
>>> > > > > > generic
>>> > > > > > > > and
>>> > > > > > > > > >> have
>>> > > > > > > > > >> > already been used in multiple
POCs and
>>> applications. In
>>> > > fact
>>> > > > > > this
>>> > > > > > > > > >> operator
>>> > > > > > > > > >> > has matured through customer
use cases. It is not
>>> just
>>> > an
>>> > > > > insert
>>> > > > > > > > > >> operator.
>>> > > > > > > > > >> > We have used it to perform
update and inserts.
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > That said, I don't think
it is a good idea to
>>> introduce
>>> > > > input
>>> > > > > > > format
>>> > > > > > > > > in
>>> > > > > > > > > >> >  these abstract implementations.
It is written to
>>> handle
>>> > > any
>>> > > > > > type
>>> > > > > > > of
>>> > > > > > > > > >> query,
>>> > > > > > > > > >> > be it a procedure call
(that was an actual customer
>>> use
>>> > > > case).
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > Chandni
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > On Fri, Dec 11, 2015 at
2:50 AM, Bhupesh Chawda <
>>> > > > > > > > > >> bhupesh@datatorrent.com>
>>> > > > > > > > > >> > wrote:
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > > Hi All,
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > We are planning to
proceed with the following
>>> approach
>>> > > for
>>> > > > > > JDBC
>>> > > > > > > > > >> *update*
>>> > > > > > > > > >> > > operator:
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >    - *Update Query
Configuration*
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >    - Example Update
Query: *update tableName set
>>> a =
>>> > ?**
>>> > > > > > where b
>>> > > > > > > > = ?
>>> > > > > > > > > >> and
>>> > > > > > > > > >> > c
>>> > > > > > > > > >> > >    > ?;*
>>> > > > > > > > > >> > >    - Example JSON
input array for parameter
>>> > > > instantiations:
>>> > > > > > > *[{a,
>>> > > > > > > > > >> > >    expression, INTEGER},
{b, expression,
>>> VARCHAR}, {c,
>>> > > > > > > expression,
>>> > > > > > > > > >> > DATE}]*
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > We are also planning
to change the JDBC Output
>>> > Operator
>>> > > in
>>> > > > > > > Malhar
>>> > > > > > > > > >> Library
>>> > > > > > > > > >> > > which currently does
just insert. We plan to make
>>> the
>>> > > > input
>>> > > > > > > format
>>> > > > > > > > > >> > > consistent for both
insert and update and hence
>>> the
>>> > > change
>>> > > > > to
>>> > > > > > > the
>>> > > > > > > > > >> current
>>> > > > > > > > > >> > > way of configuration
using JSON. Following would
>>> be
>>> > the
>>> > > > > config
>>> > > > > > > for
>>> > > > > > > > > >> > inserts:
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >    - *Insert Query
Configuration*
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >    - Example Insert
Query: *insert into tableName
>>> > values
>>> > > > (?,
>>> > > > > > ?,
>>> > > > > > > > .. ,
>>> > > > > > > > > >> ?);*
>>> > > > > > > > > >> > >    - Example JSON
input array for parameter
>>> > > > instantiations:
>>> > > > > > > *[{a,
>>> > > > > > > > > >> > >    expression, INTEGER},
{b, expression,
>>> VARCHAR}, ..
>>> > ,
>>> > > > {c,
>>> > > > > > > > > >> expression,
>>> > > > > > > > > >> > > DATE}]*
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > Please let us know
your thoughts.
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > Thanks.
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > -Bhupesh
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > On Wed, Dec 9, 2015
at 6:38 PM, Bhupesh Chawda <
>>> > > > > > > > > >> bhupesh@datatorrent.com>
>>> > > > > > > > > >> > > wrote:
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > > Hi All,
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > Would it be a
good idea to introduce the update
>>> > > > > > functionality
>>> > > > > > > to
>>> > > > > > > > > the
>>> > > > > > > > > >> > JDBC
>>> > > > > > > > > >> > > > output operator
in Apache Apex Malhar library.
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > The following
are possible approaches:
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > >    1. Accept
a update query from the user with
>>> place
>>> > > > > holders
>>> > > > > > > for
>>> > > > > > > > > >> > values.
>>> > > > > > > > > >> > > >    Example: *update
tableName set a = ?, b = ?
>>> > where c
>>> > > > = ?
>>> > > > > > and
>>> > > > > > > > d >
>>> > > > > > > > > >> ?*.
>>> > > > > > > > > >> > > >    Here "?" will
be provided by the user as java
>>> > > > > expressions
>>> > > > > > > > which
>>> > > > > > > > > >> will
>>> > > > > > > > > >> > > be
>>> > > > > > > > > >> > > >    evaluated
from the incoming tuple.
>>> > > > > > > > > >> > > >    2. Another
option is to accept in some
>>> > > configuration
>>> > > > > > format
>>> > > > > > > > > >> (json /
>>> > > > > > > > > >> > > >    xml) the following
and formulate the query
>>> in the
>>> > > > > > operator.
>>> > > > > > > > > This
>>> > > > > > > > > >> can
>>> > > > > > > > > >> > > become
>>> > > > > > > > > >> > > >    arbitrarily
complex.
>>> > > > > > > > > >> > > >    1. update
clause columns
>>> > > > > > > > > >> > > >       2. update
clause expressions
>>> > > > > > > > > >> > > >       3. where
clause columns
>>> > > > > > > > > >> > > >       4. where
clause expressions
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > I am thinking
about going ahead with 1. Please
>>> let
>>> > me
>>> > > > know
>>> > > > > > if
>>> > > > > > > > any
>>> > > > > > > > > >> other
>>> > > > > > > > > >> > > > option is possible
and whether such a
>>> functionality
>>> > > > > already
>>> > > > > > > > exists
>>> > > > > > > > > >> in
>>> > > > > > > > > >> > > some
>>> > > > > > > > > >> > > > other class.
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > Thanks.
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > -Bhupesh
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> >
>>> > > > > > > > > >>
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message