kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Rischmann <vinc...@rischmann.fr>
Subject Re: Trouble recovering after a crashed broker
Date Mon, 06 Jan 2014 09:45:46 GMT
Hi,

yes, I'm seeing the errors on the crashed broker.

My controller.log file only contains the following:

[2014-01-03 09:41:01,794] INFO [ControllerEpochListener on 1]: Initialized
controller epoch to 11 and zk version 10
(kafka.controller.ControllerEpochListener)
[2014-01-03 09:41:01,812] INFO [Controller 1]: Controller starting up
(kafka.controller.KafkaController)
[2014-01-03 09:41:02,082] INFO [Controller 1]: Controller startup complete
(kafka.controller.KafkaController)

Since friday, nothing has changed and the broker generated multiples
gigabytes of traces in server.log, one of the last exception looks like
this:

Request for offset 787449 but we only have log segments in the range 0 to
163110.

The range has increased since friday (it was "0 to 19372"), does this mean
the broker is actually catching up ?


Thanks for your help.




2014/1/3 Jun Rao <junrao@gmail.com>

> If a broker crashes and restarts, it will catch up the missing data from
> the leader replicas. Normally, when this broker is catching up, it won't be
> serving any client requests though. Are you seeing those errors on the
> crashed broker? Also, you are not supposed to see OffsetOutOfRangeException
> with just one broker failure with 3 replicas. Do you see the following in
> the controller log?
>
> "No broker in ISR is alive for ... There's potential data loss."
>
> Thanks,
>
> Jun
>
> On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann <zecmerquise@gmail.com
> >wrote:
>
> > Hi all,
> >
> > We have a cluster of 3 0.8 brokers, and this morning one of the broker
> > crashed.
> > It is a test broker, and we stored the logs in /tmp/kafka-logs. All
> topics
> > in use are replicated on the three brokers.
> >
> > You can guess the problem, when the broker rebooted it wiped all the data
> > in the logs.
> >
> > The producers and consumers are fine, but the broker with the wiped data
> > keeps generating a lot of exceptions, and I don't really know what to do
> to
> > recover.
> >
> > Example exception:
> >
> > [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing fetch
> > request for partition [topic,0] offset 814798 from consumer with
> > correlation id 0 (kafka.server.KafkaApis)
> > kafka.common.OffsetOutOfRangeException: Request for offset 814798 but we
> > only have log segments in the range 0 to 19372.
> >
> > There are a lot of them, something like 10+ per second. I (maybe wrongly)
> > assumed that the broker would catch up, if that's the case how can I see
> > the progress ?
> >
> > In general, what is the recommended way to bring back a broker with wiped
> > data in a cluster ?
> >
> > Thanks.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message