kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6468) Replication high watermark checkpoint file read for every LeaderAndIsrRequest
Date Wed, 24 Jan 2018 06:03:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336969#comment-16336969
] 

ASF GitHub Bot commented on KAFKA-6468:
---------------------------------------

ambroff opened a new pull request #4468: KAFKA-6468 Read replication-offset-checkpoint once
URL: https://github.com/apache/kafka/pull/4468
 
 
   Only read the high watermark checkpoint
   file (replication-offset-checkpoint) once. Before this patch, this file
   is read every time the broker handles LeaderAndIsrRequest. See
   kafka.cluster.Partition#getOrCreateReplica(Int, Boolean).
   
   On my local test cluster of three brokers with around 40k partitions,
   the initial LeaderAndIsrRequest refers to every partition in the
   cluster, and it can take 20 to 30 minutes to create all of the replicas
   because the replication-offset-checkpoint is nearly 2MB.
   
   Changing this code so that we only read this file once on startup
   reduces the time to create all replicas to around one minute.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Replication high watermark checkpoint file read for every LeaderAndIsrRequest
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-6468
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6468
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Kyle Ambroff-Kao
>            Assignee: Kyle Ambroff-Kao
>            Priority: Major
>
> The high watermark for each partition in a given log directory is written to disk every _replica.high.watermark.checkpoint.interval.ms_
milliseconds. This checkpoint file is used to create replicas when joining the cluster.
> [https://github.com/apache/kafka/blob/b73c765d7e172de4742a3aa023d5a0a4b7387247/core/src/main/scala/kafka/cluster/Partition.scala#L180]
> Unfortunately this file is read every time kafka.cluster.Partition#getOrCreateReplica
is invoked. For most clusters this isn't a big deal, but for a small cluster with lots of
partitions all of the reads of this file really add up.
> On my local test cluster of three brokers with around 40k partitions, the initial LeaderAndIsrRequest
refers to every partition in the cluster, and it can take 20 to 30 minutes to create all of
the replicas because the _replication-offset-checkpoint_ is nearly 2MB.
> Changing this code so that we only read this file once on startup reduces the time to
create all replicas to around one minute.
> Credit to [~onurkaraman] for finding this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message