flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pawan Manishka Gunarathna <pawan.manis...@gmail.com>
Subject Re: [Dev] Flink 'InputFormat' Interface execution related problem
Date Wed, 25 Jan 2017 05:01:42 GMT
Hi,
Thanks a lot for Fabian and Flavio.....Those information really helpful.

On Tue, Jan 24, 2017 at 3:36 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> If your column on which you want to perform the split is numeric you can
> use the NumericBetweenParametersProvider interface that automatically
> computes the splits for you. This is an example of its usage (at windows of
> 1000 items at a time) taken from the test class *JDBCInputFormatTest*:
>
> final int *fetchSize* = 1000;
> final Long *min* = 0L;
> final Long *max* = 1_000_000L;
> ParameterValuesProvider pramProvider = new
> *NumericBetweenParametersProvider*(fetchSize, min, max);
> jdbcInputFormat = JDBCInputFormat.buildJDBCInputFormat()
> .setDrivername(DRIVER_CLASS)
> .setDBUrl(DB_URL)
> .setQuery(JDBCTestBase.SELECT_ALL_BOOKS_SPLIT_BY_ID)
> .setRowTypeInfo(rowTypeInfo)
> .setParametersProvider(pramProvider)
> .setResultSetType(ResultSet.TYPE_SCROLL_INSENSITIVE)
> .finish();
>
> I hope this could help,
> Flavio
>
> On Tue, Jan 24, 2017 at 10:57 AM, Fabian Hueske <fhueske@gmail.com> wrote:
>
> > Hi,
> >
> > JdbcInputFormat implements the InputFormat interface and is handled
> exactly
> > like any other InputFormat.
> > In contrast to file-based input formats, users must explicitly specify
> the
> > input splits by providing an array of parameter values which are injected
> > into a parameterized query.
> > This is done because it is not easy to implement a generic method
> > automatically split a query into multiple (preferably equal-sized)
> partial
> > queries.
> >
> > Best, Fabian
> >
> > 2017-01-24 6:31 GMT+01:00 Pawan Manishka Gunarathna <
> > pawan.manishka@gmail.com>:
> >
> > > Hi,
> > > Thanks for your help. Since Our data source has database tables
> > > architecture I have a thought of follow that 'JDBCInputFormat' in
> Flink.
> > It
> > > would be great if you can provide some information regarding how that
> > > JDBCInputFormat execution happens?
> > >
> > > Thanks,
> > > Pawan
> > >
> > > On Mon, Jan 23, 2017 at 4:18 PM, Fabian Hueske <fhueske@gmail.com>
> > wrote:
> > >
> > > > Hi Pawan,
> > > >
> > > > I don't this this works. The InputSplits are generated by the
> > JobManager,
> > > > i.e., not in parallel by a single process.
> > > > After the parallel InputFormats have been started on the
> TaskManagers,
> > > they
> > > > request InputSplits and open() them. If there are no InputSplits
> there
> > is
> > > > no work to be done and open will not be called.
> > > > You can tweak the behavior by implementing your own InputSplits and
> > > > InputSplitAssigner which assigns exactly one input split to each
> task.
> > > >
> > > > Fabian
> > > >
> > > > 2017-01-23 8:44 GMT+01:00 Pawan Manishka Gunarathna <
> > > > pawan.manishka@gmail.com>:
> > > >
> > > > > Hi,
> > > > >
> > > > > When we are implementing that Flink *InputFormat* Interface, if we
> > have
> > > > > that*
> > > > > input split creation* part in our data analytics server APIs can
we
> > > > > directly go to the second phase of the flink InputFormat Interface
> > > > > execution.
> > > > >
> > > > > Basically I need to know that can we read those InputSplits
> directly,
> > > > > without generating InputSplits inside the InputFormat Interface.
So
> > it
> > > > > would be great if you can provide any kind of help.
> > > > >
> > > > > Thanks,
> > > > > Pawan
> > > > >
> > > > > --
> > > > >
> > > > > *Pawan Gunaratne*
> > > > > *Mob: +94 770373556*
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > *Pawan Gunaratne*
> > > *Mob: +94 770373556*
> > >
> >
>



-- 

*Pawan Gunaratne*
*Mob: +94 770373556*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message