kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Esko Suomi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-519) Allow commiting the state of single KafkaStream
Date Wed, 06 Feb 2013 06:29:15 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Esko Suomi updated KAFKA-519:

    Priority: Minor  (was: Major)
> Allow commiting the state of single KafkaStream
> -----------------------------------------------
>                 Key: KAFKA-519
>                 URL: https://issues.apache.org/jira/browse/KAFKA-519
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7, 0.7.1
>            Reporter: Esko Suomi
>            Priority: Minor
> Currently consuming multiple topics through ZK by first acquiring ConsumerConnector and
then fetching message streams for wanted topics. And when the messages have been consumed,
the current consuming state is commited with the method ConsumerConnector#commitOffsets().
> This scheme has a flaw when the consuming application is used as sort of a data piping
proxy instead of final consuming sink. In our case we read data from Kafka, repackage it and
only then move it to persistent storage. The repackaging step is relatively long running and
may span several hours (usually a few minutes) which in addition is mixed with highly asymmetric
topic throughputs; one of our topics gets about 80% of total throughput. We have about 20
topics in total. As an unwanted side effect of all this, commiting the offset whenever the
per-topic persistence step has been taken means commiting offsets for other topics too which
may eventually manifest as loss of data if the consuming application or the machine it is
running on crashes.
> So, while this loss of data can be alleviated to some extent with for example local temp
storage, it would be cleaner if KafkaStream itself would allow for partition level offset

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message