apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siyuan Hua <siy...@datatorrent.com>
Subject Re: Support new major kafka release 0.9.0
Date Tue, 24 Nov 2015 20:59:09 GMT
Hi all,

I need your idea for the design of OffsetManager for kafka input operator

First of all, some background of OffsetManager and why we may need it. The
OffsetManager is a plugin in kafka input operator for cutomized offset
management. The API will be called if the consumer offset(the message that
has been emitted, along with the window id) changes.
OffsetManager is different from offset checkpointing, we still use
checkpointed offset to recover node from failure.

2 reasons for the need of OffsetManager:
  1) User might want to store offsets in their own way (hdfs, zookeeper,
database, etc)
  2) User might want to continue consuming at application restart.

In the current version, the OffsetManager works in a central mode, each
partition only reports the offset(s) to Statslistener, the listener calls
OffsetManager to update the offsets.
The other possibility is make the OffsetManager work in a distributed
mode.  Each partition update the offset(s) on its own.

The distributed mode is more straightforward, but the developer needs to
know it's distributed, you have to manage write from multiple nodes,
collisions on your own. But also it's more real time at no risk of failure
of stats reporter

Any input is welcome, thanks!


On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <siyuan@datatorrent.com> wrote:

> I will be working on rewriting the kafka input operator.
> Here is the ticket
> https://malhar.atlassian.net/browse/MLHR-1904
> Here is some comments on the ticket
> The RC2 is out here
> https://people.apache.org/~junrao/kafka-
> We will keep most features of the old input operator but the internal
> mechanism will be changed, for example, using new API to refresh the
> metadata
> The bugs that will be fixed:
>    - Synchronized offset checkpoint
>    - Transient offsetmanager
> New features:
>    - Support customized partition schema
>    - Default OffsetManager using new
> Improvement
>    - Add window id and application name to OffsetManager interface
>    - Support multi-topic
>    - Easy configuration
> Please leave thoughts here or on the ticket. Thanks
> Best,
> Siyuan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message