kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Koshy (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-249) Separate out Kafka mirroring into a stand-alone app
Date Tue, 27 Mar 2012 01:01:36 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Joel Koshy updated KAFKA-249:

    Attachment: KAFKA-249.v2.patch

Sorry about the delay on this. Here is a patch incorporating the design
change. Overview of changes:

- Added topic watcher to ZookeeperConsumerConnector for creating message
  streams based on filters.
- The API is slightly different from the previous patch - just one
  createMessageStreamsByFilter call instead of separate ones for
- Since we may now iterate over messages from multiple topics, added a new
  KafkaMessageAndTopicStream and TopicalConsumerIterator, that iterates over
  MessageAndTopic objects.
- For wildcarded consumption, the topic count string may need to change in
  ZK for new topics. To avoid additional logic to handle this, added
  WildcardTopicCount for this, distinguished from StaticTopicCount.
  WildcardTopicCount's encoding is described in the TopicCount class.
- New mirror maker tool.
- Updated mirror maker system test.
- Updated console consumer which now allows one of: topic, whitelist,
- Added logIdent field to Logging trait, defaults to "".  Not so sure this
  is a great idea, but the reason I needed this is that the mirror-maker may
  instantiate multiple ZK connectors and it is very unclear which messages
  come from which connector.
  [2012-03-26 17:10:06,114] INFO group1_jkoshy-ld-1332807005602-c0176b3f Committing all offsets
after clearing the fetcher queues (kafka.consumer.ZookeeperConsumerConnector)

A few other comments:

- Apparently there are issues in recreating messages streams from the same
  zkconnector, so I have disabled it for all the createMessageStreams*
  methods. The createMessageStreamsByFilter method will only allow one call
  to it the way it is implemented, but I don't think that is a serious
- I noticed a small caveat in using joptsimple - if I say --whitelist ".*"
  it interprets it as ".". However, --whitelist=".*". --whitelist ".+" all
- I encountered a small shutdown issue - in the system test, I have two
  connectors. When shutting them down, the first one to shut down triggers a
  rebalance in the other connector. However, that connector is itself
  shutting down and sets zkclient to null. So I see null pointer exceptions
  due to accessing ZK as part of rebalance. We should probably add a
  isRebalancing atomic bool and not shutdown if that is set, and vice-versa.
  I can roll that in as part of this patch if it makes sense.

> Separate out Kafka mirroring into a stand-alone app
> ---------------------------------------------------
>                 Key: KAFKA-249
>                 URL: https://issues.apache.org/jira/browse/KAFKA-249
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.7.1
>         Attachments: KAFKA-249.v1.patch, KAFKA-249.v2.patch
> I would like to discuss on this jira, the feasibility/benefits of separating
> out Kafka's mirroring feature from the broker into a stand-alone app, as it
> currently has a couple of limitations and issues.
> For example, we recently had to deal with Kafka mirrors that were in fact
> idle due to the fact that mirror threads were not created at start-up due to
> a rebalancing exception, but the Kafka broker itself did not shutdown. This
> has since been fixed, but is indicative of (avoidable) problems in embedding
> non-broker specific features in the broker.
> Logically, it seems to make sense to separate it out to achieve better
> division of labor.  Furthermore, enhancements to mirroring may be less
> clunky to implement and use with a stand-alone app.  For example to support
> custom partitioning on the target cluster, or to mirror from multiple
> clusters we would probably need to be able to pass in multiple embedded
> consumer/embedded producer configs, which would be less ugly if the
> mirroring process were a stand-alone app.  Also, if we break it out, it
> would be convenient to use as a "consumption engine" for the console
> consumer which will make it easier to add on features such as wildcards in
> topic consumption, since it contains a ZooKeeper topic discovery component.
> Any suggestions and/or objections to this?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message