kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Reddy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-4096) Kafka Backup and Recovery
Date Mon, 29 Aug 2016 03:40:21 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karthik Reddy updated KAFKA-4096:
---------------------------------
    Description: 
Hi Team,

We are trying to move the data on Kafka Cluster from one region to another region.Region here
could be a separate Data center or a separate cluster within the same region.

In the effort to do this, we have stopped the ZK/Kafka of the old Cluster, detached the EBS
volumes where kafka stores all topics related data and then attached the EBS volumes to the
new cluster.

We observed that new ZK cluster came with all the data that previous ZK persisted meaning
all the topic metadata and consumer offset information. However, on the Kafka side, we noticed
that messages are not seen, all the index and log files are of empty size.

The recovery point and recovery offset checkpoint indicate the correct base offset as present
in the old cluster.

Apart from the MirrorMaker strategy to move the data from all the topics, can you let us know
is there any specific process to copy the file system snapshots from one region to other.

We did restart of Kafka/ZK but that didn't help.


Thanks,
Karthik

  was:
Hi Team,

We have seen the below messages in the Kafka logs, indicating there was a timeout on ZK.

Could you please advise us on how to tune or better optimize the Kafka-ZK communication.

Kafka and ZK are on separate servers.Currently, we have the ZK timeout set to 6000 ms.
Kafka servers have EBS volumes as the disk.

We had to restart our consumers and ZK to resolve this issue.

[2016-03-10 02:29:25,858] INFO Unable to read additional data from server sessionid 0x5531d0003f30030,
likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2016-03-10 02:29:25,958] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)
[2016-03-10 02:29:26,381] INFO Opening socket connection to server 10.200.77.74/10.200.77.74:8164.
Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-03-10 02:29:26,382] INFO Socket connection established to 10.200.77.74/10.200.77.74:8164,
initiating session (org.apache.zookeeper.ClientCnxn)
[2016-03-10 02:29:26,385] INFO Session establishment complete on server 10.200.77.74/10.200.77.74:8164,
sessionid = 0x5531d0003f30030, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2016-03-10 02:29:26,385] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2016-03-10 02:29:30,961] INFO conflict in /controller data: {"version":1,"brokerid":3,"timestamp":"1457594970952"}
stored data: {"version":1,"brokerid":5,"timestamp":"1457594970043"} (kafka.utils.ZkUtils$)
[2016-03-10 02:29:30,969] INFO New leader is 5 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2016-03-10 02:29:31,620] INFO [ReplicaFetcherManager on broker 3] Removed fetcher for partitions
[__consumer_offsets,0],[fulfillment.payments.autopay.mongooperation.response,1],[__consumer_offsets,20],[__consumer_offsets,40]
(kafka.server.ReplicaFetcherManager)
[2016-03-10 02:29:31,621] INFO [ReplicaFetcherManager on broker 3] Removed fetcher for partitions
[efit.framework.notification.error,1],[__consumer_offsets,15],[fulfillment.payments.autopay.processexception.notification,1],[__consumer_offsets,35]
(kafka.server.ReplicaFetcherManager)
[2016-03-10 02:29:31,621] INFO Truncating log efit.framework.notification.error-1 to offset
637. (kafka.log.Log)
[2016-03-10 02:29:31,621] INFO Truncating log __consumer_offsets-15 to offset 0. (kafka.log.Log)
[2016-03-10 02:29:31,622] INFO Truncating log fulfillment.payments.autopay.processexception.notification-1
to offset 0. (kafka.log.Log)
[2016-03-10 02:29:31,622] INFO Truncating log __consumer_offsets-35 to offset 0. (kafka.log.Log)
[2016-03-10 02:29:31,623] INFO Loading offsets from [__consumer_offsets,0] (kafka.server.OffsetManager)
[2016-03-10 02:29:31,624] INFO Loading offsets from [__consumer_offsets,20] (kafka.server.OffsetManager)
[2016-03-10 02:29:31,624] INFO Finished loading offsets from [__consumer_offsets,0] in 1 milliseconds.
(kafka.server.OffsetManager)
[2016-03-10 02:29:31,625] INFO Loading offsets from [__consumer_offsets,40] (kafka.server.OffsetManager)
[2016-03-10 02:29:31,625] INFO Finished loading offsets from [__consumer_offsets,20] in 1
milliseconds. (kafka.server.OffsetManager)
[2016-03-10 02:29:31,625] INFO Finished loading offsets from [__consumer_offsets,40] in 0
milliseconds. (kafka.server.OffsetManager)
[2016-03-10 02:29:31,627] INFO [ReplicaFetcherManager on broker 3] Added fetcher for partitions
List([[efit.framework.notification.error,1], initOffset 637 to broker id:1,host:10.200.77.78,port:8165]
, [[__consumer_offsets,15], initOffset 0 to broker id:1,host:10.200.77.78,port:8165] , [[fulfillment.payments.autopay.processexception.notification,1],
initOffset 0 to broker id:5,host:10.200.75.150,port:8165] , [[__consumer_offsets,35], initOffset
0 to broker id:1,host:10.200.77.78,port:8165] ) (kafka.server.ReplicaFetcherManager)
[2016-03-10 02:29:31,627] INFO [ReplicaFetcherThread-0-2], Shutting down (kafka.server.ReplicaFetcherThread

Thanks,
Karthik


> Kafka Backup and Recovery
> -------------------------
>
>                 Key: KAFKA-4096
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4096
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.0
>         Environment: RHEL 7.2, AWS EC2 compute instance
>            Reporter: Karthik Reddy
>            Assignee: Neha Narkhede
>            Priority: Critical
>
> Hi Team,
> We are trying to move the data on Kafka Cluster from one region to another region.Region
here could be a separate Data center or a separate cluster within the same region.
> In the effort to do this, we have stopped the ZK/Kafka of the old Cluster, detached the
EBS volumes where kafka stores all topics related data and then attached the EBS volumes to
the new cluster.
> We observed that new ZK cluster came with all the data that previous ZK persisted meaning
all the topic metadata and consumer offset information. However, on the Kafka side, we noticed
that messages are not seen, all the index and log files are of empty size.
> The recovery point and recovery offset checkpoint indicate the correct base offset as
present in the old cluster.
> Apart from the MirrorMaker strategy to move the data from all the topics, can you let
us know is there any specific process to copy the file system snapshots from one region to
other.
> We did restart of Kafka/ZK but that didn't help.
> Thanks,
> Karthik



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message