apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devendra Tagare <devend...@datatorrent.com>
Subject Re: JdbcPOJOInputOperator Behaviour
Date Mon, 09 May 2016 17:34:24 GMT
Hi,

There is some work going on the JDBC polling operator as per,
https://issues.apache.org/jira/browse/APEXMALHAR-2066

The feature set of this operator seems to be similar.That being said, I see
the rationale in updating the existing one since it works with POJO's
already.

Why remove fetch direction ?
It can be used as a parameter with the PreparedStatement to give the
direction of processing the result set.
We can just use something like below to instantiate the preparedStatement
and make use of that parameter,
store.connection.prepareCall(queryToRetrieveData(),getFetchDirection(),
java.sql.ResultSet.CONCUR_READ_ONLY);

Thanks,

Dev

On Mon, May 9, 2016 at 8:56 AM, Mohit Jotwani <mohit@datatorrent.com> wrote:

> +1 for incremental data.
>
> Regards,
> Mohit
> On 9 May 2016 19:59, "Yogi Devendra" <devendra.vyavahare@gmail.com> wrote:
>
> > +1 for incremental data fetching.
> > for fetchDirection variable; it is better to get inputs from original
> > author (if possible).
> >
> > ~ Yogi
> >
> > On 9 May 2016 at 19:04, Akshay Gore <akshay@datatorrent.com> wrote:
> >
> > > +1 for incremental data fetching. This is a must-have feature.
> > >
> > > -Akshay
> > > On 09-May-2016 3:39 pm, "Sandeep Deshmukh" <sandeep@datatorrent.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am using JdbcPOJOInputOperator to ingest data from mysql to HDFS. I
> > > > observed that  once the existing data is ingested, newly added data
> in
> > > > mysql is not ingested. At the same time, if I add some data to mysql
> > when
> > > > the ingestion is still going on, the newly added data is also
> ingested
> > on
> > > > HDFS.
> > > >
> > > > In the code, fetching data in batches in achieved using fetchSize
> > > parameter
> > > > that limits the number of tuples to fetch per result set and
> pageNumber
> > > is
> > > > used internally to manage the offset calculation as ( fetchSize *
> > > > pageNumber). The pageNumber is incremented per window.
> > > >
> > > > When the existing tuples are ingested, there is no further data
> ingest
> > > but
> > > > the pageNumber variable is still incremented. This results is trying
> to
> > > > fetch data that is beyond the number of tuples in the
> > table/queryresult.
> > > >
> > > > Changing offset calculations to tuples read so far will fix this
> issue
> > > and
> > > > the operator can then be used to poll for newer data in the table.
> > > >
> > > > If you need to have a quick look at the code:
> > https://github.com/apache/
> > > > incubator-apex-malhar/blob/master/library/src/main/java/
> > > > com/datatorrent/lib/db/jdbc/JdbcPOJOInputOperator.java
> > > >
> > > > Side observation: fetchDirection variable is unused in the code. Will
> > > > remove it from the class.
> > > >
> > > > Would like get your thoughts on my observations. I will create a JIRA
> > and
> > > > open a PR based on inputs received on this thread.
> > > >
> > > > Regards,
> > > > Sandeep
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message