kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Trouble recovering after a crashed broker
Date Tue, 07 Jan 2014 16:34:56 GMT
If you want replication, you need to specify the replication factor in
default.replication.factor
for auto created topics or when creating topics manually.

Thanks,

Jun


On Tue, Jan 7, 2014 at 1:17 AM, Vincent Rischmann <vincent@rischmann.fr>wrote:

> Hi,
>
> this is the output of list topic:
>
> topic: clicks partition: 0 leader: 1 replicas: 1 isr: 1
> topic: clicks partition: 1 leader: 3 replicas: 3 isr: 3
> topic: clicks partition: 2 leader: 1 replicas: 1 isr: 1
> topic: visits partition: 0 leader: 3 replicas: 3 isr: 3
> topic: visits partition: 1 leader: 2 replicas: 2 isr: 2
> topic: visits partition: 2 leader: 3 replicas: 3 isr: 3
> topic: stats.live.test partition: 0 leader: 3 replicas: 3,1,2 isr: 3,2,1
> topic: stats.live.test partition: 1 leader: 2 replicas: 1,2,3 isr: 2,3,1
> topic: stats.live.test partition: 2 leader: 2 replicas: 2,3,1 isr: 2,3,1
>
> The topic causing problems is "clicks", and the partitions requested on the
> crashed broker are 0 and 2.
> Given the output of list topic, this means that those 2 partitions are
> permanently lost right now, right ?
>
> I thought all partitions were replicated, just like for the topic
> 'stats.live.test', but apparently I screwed up when creating the topics, I
> should have check that first.
>
> Thanks for your help.
>
>
> 2014/1/6 Jun Rao <junrao@gmail.com>
>
> > How many replicas do you have on that topic? What's the output of list
> > topic?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Jan 6, 2014 at 1:45 AM, Vincent Rischmann <vincent@rischmann.fr
> > >wrote:
> >
> > > Hi,
> > >
> > > yes, I'm seeing the errors on the crashed broker.
> > >
> > > My controller.log file only contains the following:
> > >
> > > [2014-01-03 09:41:01,794] INFO [ControllerEpochListener on 1]:
> > Initialized
> > > controller epoch to 11 and zk version 10
> > > (kafka.controller.ControllerEpochListener)
> > > [2014-01-03 09:41:01,812] INFO [Controller 1]: Controller starting up
> > > (kafka.controller.KafkaController)
> > > [2014-01-03 09:41:02,082] INFO [Controller 1]: Controller startup
> > complete
> > > (kafka.controller.KafkaController)
> > >
> > > Since friday, nothing has changed and the broker generated multiples
> > > gigabytes of traces in server.log, one of the last exception looks like
> > > this:
> > >
> > > Request for offset 787449 but we only have log segments in the range 0
> to
> > > 163110.
> > >
> > > The range has increased since friday (it was "0 to 19372"), does this
> > mean
> > > the broker is actually catching up ?
> > >
> > >
> > > Thanks for your help.
> > >
> > >
> > >
> > >
> > > 2014/1/3 Jun Rao <junrao@gmail.com>
> > >
> > > > If a broker crashes and restarts, it will catch up the missing data
> > from
> > > > the leader replicas. Normally, when this broker is catching up, it
> > won't
> > > be
> > > > serving any client requests though. Are you seeing those errors on
> the
> > > > crashed broker? Also, you are not supposed to see
> > > OffsetOutOfRangeException
> > > > with just one broker failure with 3 replicas. Do you see the
> following
> > in
> > > > the controller log?
> > > >
> > > > "No broker in ISR is alive for ... There's potential data loss."
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann <
> > zecmerquise@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > We have a cluster of 3 0.8 brokers, and this morning one of the
> > broker
> > > > > crashed.
> > > > > It is a test broker, and we stored the logs in /tmp/kafka-logs. All
> > > > topics
> > > > > in use are replicated on the three brokers.
> > > > >
> > > > > You can guess the problem, when the broker rebooted it wiped all
> the
> > > data
> > > > > in the logs.
> > > > >
> > > > > The producers and consumers are fine, but the broker with the wiped
> > > data
> > > > > keeps generating a lot of exceptions, and I don't really know what
> to
> > > do
> > > > to
> > > > > recover.
> > > > >
> > > > > Example exception:
> > > > >
> > > > > [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing
> > > fetch
> > > > > request for partition [topic,0] offset 814798 from consumer with
> > > > > correlation id 0 (kafka.server.KafkaApis)
> > > > > kafka.common.OffsetOutOfRangeException: Request for offset 814798
> but
> > > we
> > > > > only have log segments in the range 0 to 19372.
> > > > >
> > > > > There are a lot of them, something like 10+ per second. I (maybe
> > > wrongly)
> > > > > assumed that the broker would catch up, if that's the case how can
> I
> > > see
> > > > > the progress ?
> > > > >
> > > > > In general, what is the recommended way to bring back a broker with
> > > wiped
> > > > > data in a cluster ?
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message