kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satya (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3726) Enable cold storage option
Date Wed, 20 Sep 2017 11:06:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173020#comment-16173020

Satya commented on KAFKA-3726:

idle approach would be using Kafka Connector HDFS Source/SInk to take backup of kafka segment
file. which can be replay when it required

> Enable cold storage option
> --------------------------
>                 Key: KAFKA-3726
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3726
>             Project: Kafka
>          Issue Type: Wish
>            Reporter: Radoslaw Gruchalski
>         Attachments: kafka-cold-storage.txt
> This JIRA builds up on the cold storage article I have published on Medium. The copy
of the article attached here.
> The need for cold storage or an "indefinite" log seems to be quite often discussed on
the user mailing list.
> The cold storage idea would enable the opportunity for the operator to keep the raw Kafka
offset files in a third party storage and allow retrieving the data back for re-consumption.
> The two possible options for enabling such functionality are, from the article:
> First approach: if Kafka provided a notification mechanism and could trigger a program
when a segment file is to be discarded, it would become feasible to provide a standard method
of moving data to cold storage in reaction to those events. Once the program finishes backing
the segments up, it could tell Kafka “it is now safe to delete these segments”.
> The second option is to provide an additional value for the log.cleanup.policy setting,
call it cold-storage. In case of this value, Kafka would move the segment files — which
otherwise would be deleted — to another destination on the server. They can be picked
up from there and moved to the cold storage.
> Both have their limitations. The former one is simply a mechanism exposed to allow operator
building up the tooling necessary to enable this. Events could be published in a manner similar
to Mesos Event Bus (https://mesosphere.github.io/marathon/docs/event-bus.html) or Kafka itself
could provide a control topic on which such info would be published. The outcome is, the operator
can subscribe to the event bus and get notified about, at least, two events:
> - log segment is complete and can be backed up
> - partition leader changed
> These two, together with an option to keep the log segment safe from compaction for a
certain amount of time, would be sufficient to reliably implement cold storage.
> The latter option, {{log.cleanup.policy}} setting would be more complete feature but
it is also much more difficult to implement.  All brokers would have keep the backup of the
data in the cold storage significantly increasing the size requirements, also, the de-duplication
of the data for the replicated data would be left completely to the operator.
> In any case, the thing to stay away from is having Kafka to deal with the physical aspect
of moving the data to and back from the cold storage. This is not Kafka's task. The intent
is to provide a method for reliable cold storage.

This message was sent by Atlassian JIRA

View raw message