Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA99A1857A for ; Fri, 11 Mar 2016 22:00:29 +0000 (UTC) Received: (qmail 921 invoked by uid 500); 11 Mar 2016 22:00:29 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 851 invoked by uid 500); 11 Mar 2016 22:00:29 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 840 invoked by uid 99); 11 Mar 2016 22:00:29 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 22:00:29 +0000 Received: from mail-lb0-f178.google.com (mail-lb0-f178.google.com [209.85.217.178]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 342A71A010E for ; Fri, 11 Mar 2016 22:00:29 +0000 (UTC) Received: by mail-lb0-f178.google.com with SMTP id k15so174683111lbg.0 for ; Fri, 11 Mar 2016 14:00:28 -0800 (PST) X-Gm-Message-State: AD7BkJJpjHTDhV7qFLXBzfOhv0VUlf5DxmzevQ82AYkrxLWitwPIjrl2tUVq61j5DGhjDos9K6pXT2BwpjrZPQ== X-Received: by 10.25.28.80 with SMTP id c77mr4183724lfc.5.1457733627619; Fri, 11 Mar 2016 14:00:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.190.67 with HTTP; Fri, 11 Mar 2016 14:00:08 -0800 (PST) In-Reply-To: References: From: Robert Metzger Date: Fri, 11 Mar 2016 23:00:08 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Dynamically repartitioned sources To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=001a11401eba0d33f0052dcd0f7a --001a11401eba0d33f0052dcd0f7a Content-Type: text/plain; charset=UTF-8 Hi Maxim, you can implement a source for the system you are describing without changing the parallelism of Flink. What you have to do is implement your own data sources for Flink. I would start by implementing the ParallelSourceFunction interface, where each parallel source instance is reading from a subset of servers. So basically one "flink partition" is reading from one or more partitions of your system. On Mon, Mar 7, 2016 at 9:18 PM, Maxim wrote: > I'm looking at using Flink for a streaming project that has to use some > internal systems as event sources. They are very similar to Kafka in their > semantic. The data is partitioned and each partition can be replayed from a > specified offset. > > The first system creates and deletes such partitions dynamically based on > load. It provides an API to get list of partitions as well as their state > (open, closed for append). > > The second system has a fixed set of a few thousand partitions, but they > are allocated to a dynamic set of hosts and each host provides poll API > that returns events from all partitions that currently reside on it. The > metadata API that returns current mapping of partitions to hosts is > provided. > > I found a thread > < > http://mail-archives.apache.org/mod_mbox/flink-user/201602.mbox/%3CCANjo42xgyUZAU=fmgGVFXVYMj7nVt67=3eJY=pWRc_nZdQ-EkA@mail.gmail.com%3E > > > that > mentioned that changing parallelism is one of the high priority items for > this year. Has any work started on it? And would it support the type of > dynamic sources we have? > > I could try adding such support myself if it would help to speed things up. > > Thanks, > > Maxim. > --001a11401eba0d33f0052dcd0f7a--