apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJAY GUPTA <ajaygit...@gmail.com>
Subject Re: APEXMALHAR-2382 User needs to create dt_meta table while using JdbcPOJOInsertOutputOperator
Date Mon, 16 Jan 2017 07:55:22 GMT
Though it would be great to avoid creating meta table, we cannot avoid it
if we want to achieve complete EXACTLY_ONCE here.
Even Kafka documentation suggests the exactly once implementation is not
perfect.





*"This is not "perfect exact once" in 2 cases: 1 Multiple producers produce
messages to same kafka partition 2 You have same message sent out and
before kafka synchronized this message among all the brokers, the operator
is started again."*

Even other distributed protocols like 3 phase commit which are generally
used in distributed systems make use of write ahead logging (WAL) for each
system participating in transaction. Considering that a DB is involved, we
cannot have a WAL without something like dt_meta table. So, even such
protocols wont be useful here.

The following article too suggests that exactly once in a distributed
system is not possible without committing offset to the same system. Also,
unlike Kafka, we don't have offset with DB
http://ben.kirw.in/2014/11/28/kafka-patterns/


Though one suggestion could be : If user is reluctant to create table in
database, we can use HDFS to give an "Almost EXACTLY ONCE" like Kafka


Ajay


On Mon, Jan 16, 2017 at 12:04 PM, Chinmay Kolhatkar <chinmay@apache.org>
wrote:

-1 for automatic schema creation...





Moreover, I am wondering whether asking user to create a dt_meta table is


right way. From an admins perspective, an ask for creation of meta table


looks wrong to me. dt_meta table is created for the purpose of exactly once


but it does not hold any user data.. On this logic admin might deny


developers for creation of table.





I suggest to start a separate thread to do exactly once for JDBC insert in


a cleaner way. We take take a look at Kafka or File outputs to see how


they've done to achieve exactly once without creating a meta location at


destination.





-Chinmay.








On Mon, Jan 16, 2017 at 11:16 AM, Pradeep Kumbhar <pradeep@datatorrent.com>


wrote:





> +1 on having operator documentation explicitly mentioning that, "dt_meta"


> table is mandatory


> for the operator to work correctly. Also provide a sample table creation


> query for reference.


>


> On Sat, Jan 14, 2017 at 1:05 PM, AJAY GUPTA <ajaygit158@gmail.com> wrote:


>


> > Since the query can be different for different databases, the user will


> > have to provide query to the operator. Rather than this, I believe it's


> > easier for user to directly execute create table query on DB.


> >


> > Also, the create table script won't be that heavy that we create script


> for


> > it. Probably adding a generic type of query in the docs itself should


> > suffice.


> >


> >


> > Ajay


> >


> > On Sat, 14 Jan 2017 at 10:27 AM, Yogi Devendra <yogidevendra@apache.org>


> > wrote:


> >


> > > As Aniruddha pointed out, table creation should be done by dbadmin.


> > >


> > > In that case, utility script will be helpful.


> > >


> > >


> > >


> > > If we embed this code inside operator or application; then it will be


> > >


> > > difficult for dbadmin to use it.


> > >


> > >


> > >


> > > ~ Yogi


> > >


> > >


> > >


> > > On 14 January 2017 at 03:43, Thomas Weise <thw@apache.org> wrote:


> > >


> > >


> > >


> > > > -1 for automatic schema modification, unless the user asked for it.


> See


> > >


> > > > comment on JIRA.


> > >


> > > >


> > >


> > > >


> > >


> > > > On Fri, Jan 13, 2017 at 5:11 AM, Aniruddha Thombare <


> > >


> > > > aniruddha@datatorrent.com> wrote:


> > >


> > > >


> > >


> > > > > The tables should be created / altered by dbadmin.


> > >


> > > > > We shouldn't worry about table creations as its one-time activity.


> > >


> > > > >


> > >


> > > > >


> > >


> > > > >


> > >


> > > > > Thanks,


> > >


> > > > >


> > >


> > > > > A


> > >


> > > > >


> > >


> > > > >


> > >


> > > > > _____________________________________


> > >


> > > > > Sent with difficulty, I mean handheld ;)


> > >


> > > > >


> > >


> > > > > On 13 Jan 2017 6:37 pm, "Yogi Devendra" <yogidevendra@apache.org>


> > > wrote:


> > >


> > > > >


> > >


> > > > > I am not very keen on having utility script.


> > >


> > > > > But, "no side-effects without explicit ask by the end-user" is


> > > important.


> > >


> > > > >


> > >


> > > > > ~ Yogi


> > >


> > > > >


> > >


> > > > > On 13 January 2017 at 16:44, Priyanka Gugale <priyag@apache.org>


> > > wrote:


> > >


> > > > >


> > >


> > > > > > IMO it's okay to create table in java code. We should document
it


> > in


> > >


> > > > > > operator guide as well as put a log message when we create
table.


> > >


> > > > > > And in case you don't have privileges, the operator should throw


> > >


> > > > > meaningful


> > >


> > > > > > message.


> > >


> > > > > >


> > >


> > > > > > -Priyanka


> > >


> > > > > >


> > >


> > > > > > On Fri, Jan 13, 2017 at 4:07 PM, Yogi Devendra <


> > >


> > > > yogidevendra@apache.org>


> > >


> > > > > > wrote:


> > >


> > > > > >


> > >


> > > > > > > My suggestions:


> > >


> > > > > > >


> > >


> > > > > > >    1. Have a separate utility script for creating this
table.


> > >


> > > > > > >    2. Have README for the utility script


> > >


> > > > > > >    3. Mention about the utility script in the operator


> javadocs.


> > >


> > > > > > >    4. Mention  about the utility script in the application


> > README.


> > >


> > > > > > >    5. If at all, you wish to ease out the process; you
can


> > > introduce


> > >


> > > > > flag


> > >


> > > > > > >    like autoPopulateMetaTable. But. default value of this
flag


> > > should


> > >


> > > > > to


> > >


> > > > > > be


> > >


> > > > > > >    off.


> > >


> > > > > > >    6. I would prefer to avoid side-effects unless explicitly


> > asked


> > > by


> > >


> > > > > the


> > >


> > > > > > >    end user.


> > >


> > > > > > >    7. Relevant exceptions should be caught and should have
a


> > > message


> > >


> > > > > > which


> > >


> > > > > > >    can be understood by the end user.


> > >


> > > > > > >


> > >


> > > > > > > ~ Yogi


> > >


> > > > > > >


> > >


> > > > > > > On 13 January 2017 at 15:57, Hitesh Kapoor <


> > hitesh@datatorrent.com


> > > >


> > >


> > > > > > wrote:


> > >


> > > > > > >


> > >


> > > > > > > > Hi All,


> > >


> > > > > > > >


> > >


> > > > > > > > Currently to use JdbcPOJOInsertOutputOperator, user
needs to


> > > create


> > >


> > > > > > > > "dt_meta" table to enforce


> > >


> > > > > > > > exactly-once processing semantic. If the user fails
to
create


> > > this


> > >


> > > > > > table


> > >


> > > > > > > > before launching the application an exception is thrown.


> > >


> > > > > > > > To handle this scenario we can automate the process
of


> creating


> > >


> > > > this


> > >


> > > > > > > table,


> > >


> > > > > > > > assuming the user has the appropriate privileges.
The
problem


> > > with


> > >


> > > > > this


> > >


> > > > > > > > approach is that it may not be a very good idea to
modify


> > user's


> > >


> > > > > > database


> > >


> > > > > > > > automatically , also if the user doesn't has the appropriate


> > >


> > > > > privileges


> > >


> > > > > > > it


> > >


> > > > > > > > will eventually throw an exception (however a different


> > > exception).


> > >


> > > > > > > > So I need your opinion if we should automate the creation
of


> > this


> > >


> > > > > > > internal


> > >


> > > > > > > > table (if it doesn't exists) or continue with the
existing


> > >


> > > > behaviour


> > >


> > > > > or


> > >


> > > > > > > > anything else.


> > >


> > > > > > > >


> > >


> > > > > > > > Regards,


> > >


> > > > > > > > Hitesh


> > >


> > > > > > > >


> > >


> > > > > > >


> > >


> > > > > >


> > >


> > > > >


> > >


> > > >


> > >


> > >


> >


>


>


>


> --


> *regards,*


> *~pradeep*


>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message