apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Changes to JdbcOutputOperator
Date Mon, 28 Dec 2015 10:46:09 GMT
Thanks Chandni for the comments.

Additionally I think, we should do away with constructing the SQL query in
the operator. This is because it is only possible in case of simple insert
statements. In case a where clause is needed in the insert statement, we
cannot construct the SQL easily. If we are accepting a parametrized query
from the user for update/merge, why not do the same for insert statements
as well? Then the hierarchy would look like:
- AbstractJdbcPOJOOutputOperator (As suggested)
-- JdbcPOJOOutputOperator (Takes care of all statements insert, update,
merge, delete)

We don't need a separate operator for inserts and update/merge.

Thanks.
-Bhupesh


On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <chandni@datatorrent.com>
wrote:

>   1. FieldInfo, is ultimately a custom object and any user who uses this
>    operator has to construct an object, populate it and then use it. We are
>    trying to avoid using any custom object and allow any user to use the
>    operator without writing any extra code; just configuration.
> FieldInfo is a way to provide configuration in a UI friendly way. Providing
> configuration as JSON is not UI friendly.
>
>    2. In case of update / merge, we need what SQL data types that the
>    expression provided by the user would evaluate to. In case of insert we
> can
>    go and fetch the data types of the columns directly from DB. However,
> the
>    same is not possible for custom expressions; "avg(salary)" for instance.
> Ok so here is where you can make a change.
> - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
> - Abstraction is to fetch the type of column.
> - Add a concrete JDBCPojoInsertOutput  that derives the types of columns
> directly from DB. Please note that FieldInfo can also provide type of
> derived column.
>
>
> You mentioned "We are trying to avoid using any custom object and allow any
> user to use the
>    operator without writing any extra code".
> This I think is specific to your use case. You can create an extension of
> the above which takes JSON blob and creates FieldInfos from it.
>
> Chandni
>
>
>
>
>
> On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
>
> > Hi Chandni,
> >
> > Following are the issues:
> >
> >    1. FieldInfo, is ultimately a custom object and any user who uses this
> >    operator has to construct an object, populate it and then use it. We
> are
> >    trying to avoid using any custom object and allow any user to use the
> >    operator without writing any extra code; just configuration.
> >    2. In case of update / merge, we need what SQL data types that the
> >    expression provided by the user would evaluate to. In case of insert
> we
> > can
> >    go and fetch the data types of the columns directly from DB. However,
> > the
> >    same is not possible for custom expressions; "avg(salary)" for
> instance.
> >
> > Thanks.
> >
> > -Bhupesh
> >
> >
> > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <chandni@datatorrent.com>
> > wrote:
> >
> > > Hi Bhupesh,
> > >
> > > JDBCPojoOutputOperator was written for a demo and therefore it was
> marked
> > > Evolving which is why I had mentioned that you should feel free to
> modify
> > > it.
> > >
> > > I think an insert query can be as complex as any other query. It uses
> > > FieldInfo because in the app builder it is easy for the user to provide
> > > that instead of JSON String.
> > >
> > > Can you please provide specifics about what it is that you find
> difficult
> > > to change/implement for providing update/merge/delete support in
> > > JDBCPojoOutputOperator?
> > >
> > > Chandni
> > >
> > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > It has been pointed out that adding another class for handling
> update /
> > > > merge queries would not be a good option.
> > > >
> > > > Here are the current implementation details:
> > > >
> > > >    - We have an existing class: JdbcPOJOOutputOperator which accepts
> a
> > > list
> > > >    of FieldInfo objects. Each element of this list indicates the
> > details
> > > >    (column name, pojo field expression and datatype) of fields that
> > need
> > > > to be
> > > >    inserted. Using this, the operator formulates the insert query in
> > the
> > > > setup
> > > >    method and identifies the sql datatypes of these columns from the
> > > > database
> > > >    using the table name.
> > > >
> > > >
> > > >    - Now, coming to the update / merge feature, it is difficult to
> > > >    formulate the update / merge query in the operator logic due to
> the
> > > > complex
> > > >    structure of these statements. For this reason, we plan to take a
> > > >    parametrized SQL query from the user. This may look like: *"update
> > > table
> > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such statements
> can
> > be
> > > >    accepted from the user in addition to a json string which
> indicates
> > > the
> > > >    details for the parameters: *column name, the pojo expression and
> > the
> > > >    sql data type* of the expression. Note that this information is
> > > similar
> > > >    to the FieldInfo object, but is a string which can be configured
> > > easily
> > > > by
> > > >    the user. Also note that the data type which is accepted is the
> SQL
> > > data
> > > >    type. Using this info we can populate the parametrized query and
> run
> > > it
> > > >    based on the incoming POJO.
> > > >
> > > > The second approach is able to handle all kinds of queries (insert /
> > > update
> > > > / merge / delete). However, since we already have the
> > > > JdbcPOJOOutputOperator, we would like to merge the new functionality
> > into
> > > > the same class.
> > > >
> > > > Here we have the following options:
> > > >
> > > >    1. Change the existing class (JdbcPOJOOutputOperator) to the
> second
> > > >    approach which is more generic and also handles inserts.
> > > >    2. Add the update/ merge functionality to the existing class
> without
> > > >    changing the existing functionality. This will have two different
> > ways
> > > > that
> > > >    insert queries may be handled in the operator.
> > > >    3. Add another class which extends from JdbcPOJOOutputOperator and
> > > have
> > > >    the update/merge functionality there. (This is not recommended.)
> > > >    4. Any other approach.
> > > >
> > > > Please suggest.
> > > >
> > > > Thanks.
> > > >
> > > > -Bhupesh.
> > > >
> > > >
> > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
> > chandni@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > No I don't think we are restricting Malhar to just abstract
> classes.
> > > > > Whenever they are couple of use cases that we see quite often, we
> add
> > > > > concrete implementations.
> > > > >
> > > > > For eg. FileLineInputOperator which is a concrete implementation
of
> > > > > AbstractFileInputOperator. FSSliceReader is an example as well.
> > > > >
> > > > > In AbstractJdbcOutputOperator case there hasn't been such common
> > > > > insert/update query.
> > > > >
> > > > > Also if you look at the example I provided, it is very simple to
> > > provide
> > > > a
> > > > > concrete implementation.
> > > > >
> > > > > If you would like to change JdbcPOJOOutputOperator to work for
> > > > > "UPDATE/MERGE" then please go ahead.
> > > > >
> > > > > Chandni
> > > > >
> > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> > > bhupesh@datatorrent.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > I see. So, just to understand more, do we plan to keep Malhar
> > > > restricted
> > > > > to
> > > > > > the base functionality (as in abstract classes)? And put the
> > > > > configuration
> > > > > > aspect / concrete implementations in apps that use these
> operators?
> > > > > >
> > > > > > Thanks.
> > > > > > Bhupesh
> > > > > >
> > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> > > > chandni@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Here is an example of doing Upsert with JDBC:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Chandni
> > > > > > >
> > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > > > > chandni@datatorrent.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The operators are under Malhar/lib/db/jdbc.
> > > > > > > >
> > > > > > > > Here is one of them:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > > > > > >
> > > > > > > > They work with any kind PreparedStatement - insert
or update
> > > > > > > >
> > > > > > > > Chandni
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > > > > > bhupesh@datatorrent.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Chandni,
> > > > > > > >>
> > > > > > > >> I don't see an update query being handled in the
operator.
> > Could
> > > > you
> > > > > > > >> please
> > > > > > > >> point me to the appropriate class?
> > > > > > > >> Or did you mean that handling a update query is
just a
> matter
> > of
> > > > > > > extending
> > > > > > > >> the class and providing a concrete implementation?
> > > > > > > >>
> > > > > > > >> Thanks.
> > > > > > > >> -Bhupesh
> > > > > > > >>
> > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh
<
> > > > > > > chandni@datatorrent.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Bhupesh,
> > > > > > > >> >
> > > > > > > >> > The current abstract JDBC Output Operators
 in library are
> > > > generic
> > > > > > and
> > > > > > > >> have
> > > > > > > >> > already been used in multiple POCs and applications.
In
> fact
> > > > this
> > > > > > > >> operator
> > > > > > > >> > has matured through customer use cases. It
is not just an
> > > insert
> > > > > > > >> operator.
> > > > > > > >> > We have used it to perform update and inserts.
> > > > > > > >> >
> > > > > > > >> > That said, I don't think it is a good idea
to introduce
> > input
> > > > > format
> > > > > > > in
> > > > > > > >> >  these abstract implementations. It is written
to handle
> any
> > > > type
> > > > > of
> > > > > > > >> query,
> > > > > > > >> > be it a procedure call (that was an actual
customer use
> > case).
> > > > > > > >> >
> > > > > > > >> > Chandni
> > > > > > > >> >
> > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh
Chawda <
> > > > > > > >> bhupesh@datatorrent.com>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi All,
> > > > > > > >> > >
> > > > > > > >> > > We are planning to proceed with the
following approach
> for
> > > > JDBC
> > > > > > > >> *update*
> > > > > > > >> > > operator:
> > > > > > > >> > >
> > > > > > > >> > >    - *Update Query Configuration*
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >    - Example Update Query: *update tableName
set a = ?**
> > > > where b
> > > > > > = ?
> > > > > > > >> and
> > > > > > > >> > c
> > > > > > > >> > >    > ?;*
> > > > > > > >> > >    - Example JSON input array for parameter
> > instantiations:
> > > > > *[{a,
> > > > > > > >> > >    expression, INTEGER}, {b, expression,
VARCHAR}, {c,
> > > > > expression,
> > > > > > > >> > DATE}]*
> > > > > > > >> > >
> > > > > > > >> > > We are also planning to change the JDBC
Output Operator
> in
> > > > > Malhar
> > > > > > > >> Library
> > > > > > > >> > > which currently does just insert. We
plan to make the
> > input
> > > > > format
> > > > > > > >> > > consistent for both insert and update
and hence the
> change
> > > to
> > > > > the
> > > > > > > >> current
> > > > > > > >> > > way of configuration using JSON. Following
would be the
> > > config
> > > > > for
> > > > > > > >> > inserts:
> > > > > > > >> > >
> > > > > > > >> > >    - *Insert Query Configuration*
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >    - Example Insert Query: *insert into
tableName values
> > (?,
> > > > ?,
> > > > > > .. ,
> > > > > > > >> ?);*
> > > > > > > >> > >    - Example JSON input array for parameter
> > instantiations:
> > > > > *[{a,
> > > > > > > >> > >    expression, INTEGER}, {b, expression,
VARCHAR}, .. ,
> > {c,
> > > > > > > >> expression,
> > > > > > > >> > > DATE}]*
> > > > > > > >> > >
> > > > > > > >> > > Please let us know your thoughts.
> > > > > > > >> > >
> > > > > > > >> > > Thanks.
> > > > > > > >> > >
> > > > > > > >> > > -Bhupesh
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh
Chawda <
> > > > > > > >> bhupesh@datatorrent.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi All,
> > > > > > > >> > > >
> > > > > > > >> > > > Would it be a good idea to introduce
the update
> > > > functionality
> > > > > to
> > > > > > > the
> > > > > > > >> > JDBC
> > > > > > > >> > > > output operator in Apache Apex
Malhar library.
> > > > > > > >> > > >
> > > > > > > >> > > > The following are possible approaches:
> > > > > > > >> > > >
> > > > > > > >> > > >    1. Accept a update query from
the user with place
> > > holders
> > > > > for
> > > > > > > >> > values.
> > > > > > > >> > > >    Example: *update tableName set
a = ?, b = ? where c
> > = ?
> > > > and
> > > > > > d >
> > > > > > > >> ?*.
> > > > > > > >> > > >    Here "?" will be provided by
the user as java
> > > expressions
> > > > > > which
> > > > > > > >> will
> > > > > > > >> > > be
> > > > > > > >> > > >    evaluated from the incoming
tuple.
> > > > > > > >> > > >    2. Another option is to accept
in some
> configuration
> > > > format
> > > > > > > >> (json /
> > > > > > > >> > > >    xml) the following and formulate
the query in the
> > > > operator.
> > > > > > > This
> > > > > > > >> can
> > > > > > > >> > > become
> > > > > > > >> > > >    arbitrarily complex.
> > > > > > > >> > > >    1. update clause columns
> > > > > > > >> > > >       2. update clause expressions
> > > > > > > >> > > >       3. where clause columns
> > > > > > > >> > > >       4. where clause expressions
> > > > > > > >> > > >
> > > > > > > >> > > > I am thinking about going ahead
with 1. Please let me
> > know
> > > > if
> > > > > > any
> > > > > > > >> other
> > > > > > > >> > > > option is possible and whether
such a functionality
> > > already
> > > > > > exists
> > > > > > > >> in
> > > > > > > >> > > some
> > > > > > > >> > > > other class.
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks.
> > > > > > > >> > > >
> > > > > > > >> > > > -Bhupesh
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message