apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Changes to JdbcOutputOperator
Date Mon, 28 Dec 2015 09:09:00 GMT
Hi All,

It has been pointed out that adding another class for handling update /
merge queries would not be a good option.

Here are the current implementation details:

   - We have an existing class: JdbcPOJOOutputOperator which accepts a list
   of FieldInfo objects. Each element of this list indicates the details
   (column name, pojo field expression and datatype) of fields that need to be
   inserted. Using this, the operator formulates the insert query in the setup
   method and identifies the sql datatypes of these columns from the database
   using the table name.


   - Now, coming to the update / merge feature, it is difficult to
   formulate the update / merge query in the operator logic due to the complex
   structure of these statements. For this reason, we plan to take a
   parametrized SQL query from the user. This may look like: *"update table
   set x = ?, y = ? where z + w > ? and a == 1;"*. Such statements can be
   accepted from the user in addition to a json string which indicates the
   details for the parameters: *column name, the pojo expression and the
   sql data type* of the expression. Note that this information is similar
   to the FieldInfo object, but is a string which can be configured easily by
   the user. Also note that the data type which is accepted is the SQL data
   type. Using this info we can populate the parametrized query and run it
   based on the incoming POJO.

The second approach is able to handle all kinds of queries (insert / update
/ merge / delete). However, since we already have the
JdbcPOJOOutputOperator, we would like to merge the new functionality into
the same class.

Here we have the following options:

   1. Change the existing class (JdbcPOJOOutputOperator) to the second
   approach which is more generic and also handles inserts.
   2. Add the update/ merge functionality to the existing class without
   changing the existing functionality. This will have two different ways that
   insert queries may be handled in the operator.
   3. Add another class which extends from JdbcPOJOOutputOperator and have
   the update/merge functionality there. (This is not recommended.)
   4. Any other approach.

Please suggest.

Thanks.

-Bhupesh.


On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <chandni@datatorrent.com>
wrote:

> No I don't think we are restricting Malhar to just abstract classes.
> Whenever they are couple of use cases that we see quite often, we add
> concrete implementations.
>
> For eg. FileLineInputOperator which is a concrete implementation of
> AbstractFileInputOperator. FSSliceReader is an example as well.
>
> In AbstractJdbcOutputOperator case there hasn't been such common
> insert/update query.
>
> Also if you look at the example I provided, it is very simple to provide a
> concrete implementation.
>
> If you would like to change JdbcPOJOOutputOperator to work for
> "UPDATE/MERGE" then please go ahead.
>
> Chandni
>
> On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
>
> > I see. So, just to understand more, do we plan to keep Malhar restricted
> to
> > the base functionality (as in abstract classes)? And put the
> configuration
> > aspect / concrete implementations in apps that use these operators?
> >
> > Thanks.
> > Bhupesh
> >
> > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <chandni@datatorrent.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Here is an example of doing Upsert with JDBC:
> > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > >
> > > Thanks,
> > > Chandni
> > >
> > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> chandni@datatorrent.com
> > >
> > > wrote:
> > >
> > > > The operators are under Malhar/lib/db/jdbc.
> > > >
> > > > Here is one of them:
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > >
> > > > They work with any kind PreparedStatement - insert or update
> > > >
> > > > Chandni
> > > >
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > bhupesh@datatorrent.com>
> > > > wrote:
> > > >
> > > >> Hi Chandni,
> > > >>
> > > >> I don't see an update query being handled in the operator. Could you
> > > >> please
> > > >> point me to the appropriate class?
> > > >> Or did you mean that handling a update query is just a matter of
> > > extending
> > > >> the class and providing a concrete implementation?
> > > >>
> > > >> Thanks.
> > > >> -Bhupesh
> > > >>
> > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > chandni@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Bhupesh,
> > > >> >
> > > >> > The current abstract JDBC Output Operators  in library are generic
> > and
> > > >> have
> > > >> > already been used in multiple POCs and applications. In fact
this
> > > >> operator
> > > >> > has matured through customer use cases. It is not just an insert
> > > >> operator.
> > > >> > We have used it to perform update and inserts.
> > > >> >
> > > >> > That said, I don't think it is a good idea to introduce input
> format
> > > in
> > > >> >  these abstract implementations. It is written to handle any
type
> of
> > > >> query,
> > > >> > be it a procedure call (that was an actual customer use case).
> > > >> >
> > > >> > Chandni
> > > >> >
> > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > >> bhupesh@datatorrent.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi All,
> > > >> > >
> > > >> > > We are planning to proceed with the following approach for
JDBC
> > > >> *update*
> > > >> > > operator:
> > > >> > >
> > > >> > >    - *Update Query Configuration*
> > > >> > >
> > > >> > >
> > > >> > >    - Example Update Query: *update tableName set a = ?**
where b
> > = ?
> > > >> and
> > > >> > c
> > > >> > >    > ?;*
> > > >> > >    - Example JSON input array for parameter instantiations:
> *[{a,
> > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c,
> expression,
> > > >> > DATE}]*
> > > >> > >
> > > >> > > We are also planning to change the JDBC Output Operator
in
> Malhar
> > > >> Library
> > > >> > > which currently does just insert. We plan to make the input
> format
> > > >> > > consistent for both insert and update and hence the change
to
> the
> > > >> current
> > > >> > > way of configuration using JSON. Following would be the
config
> for
> > > >> > inserts:
> > > >> > >
> > > >> > >    - *Insert Query Configuration*
> > > >> > >
> > > >> > >
> > > >> > >    - Example Insert Query: *insert into tableName values
(?, ?,
> > .. ,
> > > >> ?);*
> > > >> > >    - Example JSON input array for parameter instantiations:
> *[{a,
> > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. ,
{c,
> > > >> expression,
> > > >> > > DATE}]*
> > > >> > >
> > > >> > > Please let us know your thoughts.
> > > >> > >
> > > >> > > Thanks.
> > > >> > >
> > > >> > > -Bhupesh
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > >> bhupesh@datatorrent.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi All,
> > > >> > > >
> > > >> > > > Would it be a good idea to introduce the update functionality
> to
> > > the
> > > >> > JDBC
> > > >> > > > output operator in Apache Apex Malhar library.
> > > >> > > >
> > > >> > > > The following are possible approaches:
> > > >> > > >
> > > >> > > >    1. Accept a update query from the user with place
holders
> for
> > > >> > values.
> > > >> > > >    Example: *update tableName set a = ?, b = ? where
c = ? and
> > d >
> > > >> ?*.
> > > >> > > >    Here "?" will be provided by the user as java expressions
> > which
> > > >> will
> > > >> > > be
> > > >> > > >    evaluated from the incoming tuple.
> > > >> > > >    2. Another option is to accept in some configuration
format
> > > >> (json /
> > > >> > > >    xml) the following and formulate the query in the
operator.
> > > This
> > > >> can
> > > >> > > become
> > > >> > > >    arbitrarily complex.
> > > >> > > >    1. update clause columns
> > > >> > > >       2. update clause expressions
> > > >> > > >       3. where clause columns
> > > >> > > >       4. where clause expressions
> > > >> > > >
> > > >> > > > I am thinking about going ahead with 1. Please let
me know if
> > any
> > > >> other
> > > >> > > > option is possible and whether such a functionality
already
> > exists
> > > >> in
> > > >> > > some
> > > >> > > > other class.
> > > >> > > >
> > > >> > > > Thanks.
> > > >> > > >
> > > >> > > > -Bhupesh
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message