kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Forehand <luke.foreh...@networkedinsights.com>
Subject RE: replicas have different earliest offset
Date Wed, 28 Aug 2013 23:59:44 GMT
Jay, great information thank you.  I am in a testing phase so I have been continually resetting
the commit offsets of my consumers before re-running consumer performance tests.  I realize
now my retention policy was set as 7 days, and I had added 3 new brokers at day 5 and reassigned
partitions to these new brokers.  So it seems the partitions owned by original broker 0 have
rolled, but the re-assignment of partitions to brokers 1,2,3 have reset the period of the
retention policy for these partitions.  For sake of better consistency, maybe the current
stats of the retention policy could be sent to the new broker during the partition reassignment.
 That way, partitions on brokers 1,2,3 would roll at roughly the same time as the partitions
on broker 0.  Although like you said its a lower bound and perhaps not that important (just
slightly confusing when a noob is trying to spot check the validity of a replica).  In the
meantime I will disable the retention policy and start consuming at an offset that is in the
range of all replicas.  Thank you again!

Luke Forehand | NetworkedInsights.com | Software Engineer

From: Jay Kreps <jay.kreps@gmail.com>
Sent: Wednesday, August 28, 2013 5:29 PM
To: dev@kafka.apache.org
Subject: Re: replicas have different earliest offset

On a single server our retention window is always approximate and a lower
bound on what is retained since we only discard full partitions at a time.
That is if you say you want to retain 100GB and have a 1GB partition size
we will discard the last partition when doing so would not bring the
retained data below 100GB (and similarly with time).

Between servers no attempt is made to synchronize the discard of data. That
is, it is likely that all replicas will discard at roughly the same time
but this is purely a local computation for each of them. Since it is
approximate and a lower bound it does not seem useful to try to synchronize
this further.

If your consumers are bumping up against the retention window so close that
they may actually be falling off that is a problem. Indeed even in the
absence of leader change it is likely that if you are lagging this much you
will eventually fall off the end of the retention window on the leader. So
this is either a problem of retention being too small (double it) or the
consumer being fundamentally unable to keep up (in which case no amount of
retention will help).


On Wed, Aug 28, 2013 at 2:51 PM, Luke Forehand <
luke.forehand@networkedinsights.com> wrote:

> I'm running into strange behavior when testing failure scenarios.  I have
> 4 brokers and 8 partitions for a topic called "feed".  I wrote a piece of
> code that prints out the partitionId, leaderId, and earliest offset for
> each partition.
> Here is the printed information about partition leader earliest offsets:
> partition:0 leader:0 offset: 1676913
> partition:1 leader:1 offset: 0
> partition:2 leader:2 offset: 0
> partition:3 leader:0 offset: 1676760
> partition:4 leader:0 offset: 1676635
> partition:5 leader:1 offset: 0
> partition:6 leader:2 offset: 0
> partition:7 leader:0 offset: 1676101
> I then kill broker 0 (using kill <pid>) and re-run my program
> partition:0 leader:1 offset: 0
> partition:1 leader:1 offset: 0
> partition:2 leader:2 offset: 0
> partition:3 leader:3 offset: 0
> partition:4 leader:1 offset: 0
> partition:5 leader:1 offset: 0
> partition:6 leader:2 offset: 0
> partition:7 leader:1 offset: 0
> As you can see the leaders have changed where the leader was broker 0.
>  However the earliest offset has also changed.  I was under the impression
> that a replica must have the same offset range otherwise it would confuse
> the consumer of the partition.  For example I run into an issue where
> during a failover test my consumer tries to request an offset into a
> partition on the new leader but the offset didn't exist (it was earlier
> than the earliest offset in that partition).  Can anybody explain what is
> happening?
> Here is my code that prints the leader partition offset information:
> https://gist.github.com/lukeforehand/c37e22aea7192e00fff5
> Thanks,
> Luke

View raw message