Mailing-List: contact dev-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAAqc5=HYUaXA9nXFb2n_+8XpqSjWH4gNatQseDwD6fCoG4Qb=Q@mail.gmail.com>
References: 
 <CAAqc5=HYUaXA9nXFb2n_+8XpqSjWH4gNatQseDwD6fCoG4Qb=Q@mail.gmail.com>
From: Robert Metzger <rmetzger@apache.org>
Date: Fri, 11 Mar 2016 23:00:08 +0100
Message-ID: 
 <CAGr9p8AZi3SGk_KdOwDt2PJPc3OROnAZFXrw_F11n2GbDyXKKw@mail.gmail.com>
Subject: Re: Dynamically repartitioned sources
To: "dev@flink.apache.org" <dev@flink.apache.org>
Content-Type: multipart/alternative; boundary=001a11401eba0d33f0052dcd0f7a

--001a11401eba0d33f0052dcd0f7a
Content-Type: text/plain; charset=UTF-8

Hi Maxim,

you can implement a source for the system you are describing without
changing the parallelism of Flink. What you have to do is implement your
own data sources for Flink.
I would start by implementing the ParallelSourceFunction interface, where
each parallel source instance is reading from a subset of servers.

So basically one "flink partition" is reading from one or more partitions
of your system.


On Mon, Mar 7, 2016 at 9:18 PM, Maxim <mfateev@gmail.com> wrote:

> I'm looking at using Flink for a streaming project that has to use some
> internal systems as event sources. They are very similar to Kafka in their
> semantic. The data is partitioned and each partition can be replayed from a
> specified offset.
>
> The first system creates and deletes such partitions dynamically based on
> load. It provides an API to get list of partitions as well as their state
> (open, closed for append).
>
> The second system has a fixed set of a few thousand partitions, but they
> are allocated to a dynamic set of hosts and each host provides poll API
> that returns events from all partitions that currently reside on it. The
> metadata API that returns current mapping of partitions to hosts is
> provided.
>
> I found a thread
> <
> http://mail-archives.apache.org/mod_mbox/flink-user/201602.mbox/%3CCANjo42xgyUZAU=fmgGVFXVYMj7nVt67=3eJY=pWRc_nZdQ-EkA@mail.gmail.com%3E
> >
> that
> mentioned that changing parallelism is one of the high priority items for
> this year. Has any work started on it? And would it support the type of
> dynamic sources we have?
>
> I could try adding such support myself if it would help to speed things up.
>
> Thanks,
>
> Maxim.
>

--001a11401eba0d33f0052dcd0f7a--