kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Rhyne <crh...@signal.co>
Subject Re: Java high level consumer providing duplicate messages when auto commit is off
Date Wed, 21 Oct 2015 21:20:33 GMT
Hi Kris,

Thanks for the tip.  I'm going to investigate this further.  I checked and
we have fairly short zk timeouts and run with a smaller memory allocation
on the two environments we encounter this issue.  I'll let you all know
what I find.

I saw this ticket https://issues.apache.org/jira/browse/KAFKA-2049 that
seems to be related to the problem (but would only inform that an issue
occurred).  Are there any other open issues that could be worked on to
improve Kafka's handling of this situation?

Thanks,
Cliff

On Wed, Oct 21, 2015 at 2:53 PM, Kris K <squarekscode@gmail.com> wrote:

> Hi Cliff,
>
> One other case I observed in my environment is - when there were gc pauses
> on one of our high level consumer in the group.
>
> Thanks,
> Kris
>
> On Wed, Oct 21, 2015 at 10:12 AM, Cliff Rhyne <crhyne@signal.co> wrote:
>
> > Hi James,
> >
> > There are two scenarios we run:
> >
> > 1. Multiple partitions with one consumer per partition.  This rarely has
> > starting/stopping of consumers, so the pool is very static.  There is a
> > configured consumer timeout, which is causing the
> ConsumerTimeoutException
> > to get thrown prior to the test starting.  We handle this exception and
> > then resume consuming.
> > 2. Single partition with one consumer.  This consumer is started by a
> > triggered condition (number of messages pending to be processed in the
> > kafka topic or a schedule).  The consumer is stopped after processing is
> > completed.
> >
> > In both cases, based on my understanding there shouldn't be a rebalance
> as
> > either a) all consumers are running or b) there's only one consumer /
> > partition.  Also, the same consumer group is used by all consumers in
> > scenario 1 and 2.  Is there a good way to investigate whether rebalances
> > are occurring?
> >
> > Thanks,
> > Cliff
> >
> > On Wed, Oct 21, 2015 at 11:37 AM, James Cheng <jcheng@tivo.com> wrote:
> >
> > > Do you have multiple consumers in a consumer group?
> > >
> > > I think that when a new consumer joins the consumer group, that the
> > > existing consumers will stop consuming during the group rebalance, and
> > then
> > > when they start consuming again, that they will consume from the last
> > > committed offset.
> > >
> > > You should get more verification on this, tho. I might be remembering
> > > wrong.
> > >
> > > -James
> > >
> > > > On Oct 21, 2015, at 8:40 AM, Cliff Rhyne <crhyne@signal.co> wrote:
> > > >
> > > > Hi,
> > > >
> > > > My team and I are looking into a problem where the Java high level
> > > consumer
> > > > provides duplicate messages if we turn auto commit off (using version
> > > > 0.8.2.1 of the server and Java client).  The expected sequence of
> > events
> > > > are:
> > > >
> > > > 1. Start high-level consumer and initialize a KafkaStream to get a
> > > > ConsumerIterator
> > > > 2. Consume n items (could be 10,000, could be 1,000,000) from the
> > > iterator
> > > > 3. Commit the new offsets
> > > >
> > > > What we are seeing is that during step 2, some number of the n
> messages
> > > are
> > > > getting returned by the iterator in duplicate (in some cases, we've
> > seen
> > > > n*5 messages consumed).  The problem appears to go away if we turn on
> > > auto
> > > > commit (and committing offsets to kafka helped too), but auto commit
> > > causes
> > > > conflicts with our offset rollback logic.  The issue seems to happen
> > more
> > > > when we are in our test environment on a lower-cost cloud provider.
> > > >
> > > > Diving into the Java and Scala classes including the
> ConsumerIterator,
> > > it's
> > > > not obvious what event causes a duplicate offset to be requested or
> > > > returned (there's even a loop that is supposed to exclude duplicate
> > > > messages in this class).  I tried turning on trace logging but my
> log4j
> > > > config isn't getting the Kafka client logs to write out.
> > > >
> > > > Does anyone have suggestions of where to look or how to enable
> logging?
> > > >
> > > > Thanks,
> > > > Cliff
> > >
> > >
> > > ________________________________
> > >
> > > This email and any attachments may contain confidential and privileged
> > > material for the sole use of the intended recipient. Any review,
> copying,
> > > or distribution of this email (or any attachments) by others is
> > prohibited.
> > > If you are not the intended recipient, please contact the sender
> > > immediately and permanently delete this email and any attachments. No
> > > employee or agent of TiVo Inc. is authorized to conclude any binding
> > > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> > > Inc. may only be made by a signed written agreement.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message