kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-6442) Catch 22 with cluster rebalancing
Date Thu, 11 Jan 2018 17:12:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas updated KAFKA-6442:
---------------------------
    Description: 
PS. I classified this as a bug because I think the cluster should not be stuck in that situation,
apologies if that is wrong.

Hi,
I found myself in a situation a bit difficult to explain so I will skip the how I ended up
in this situation, but here is the problem.

Some of the brokers of my cluster are permanently gone. Consequently, I had some partitions
that now had offline leaders etc so, I used the {{kafka-reassign-partitions.sh}} to rebalance
my topics and for the most part that worked ok. Where that did not work ok, was for partitions
that had leaders, rs and irs completely in the gone brokers. Those got stuck halfway through
to what now looks like

{{Topic&#58; topicA      Partition&#58; 32      Leader&#58; &#45;1      Replicas&#58;
1&#44;6&#44;2&#44;7&#44;3&#44;8      Isr&#58; }}
(1,2,3 are legit, 6,7,8 permanently gone)

So the first catch 22, is that I cannot elect a new leader, because the leader needs to be
elected from the ISR, and I cannot recreate the ISR because the topic has no leader.

The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} because the previous
one is supposedly still in progress, and I cannot increase the number of partitions to account
for the now permanently offline partitions, because that produces the following error {{Error
while executing topic command requirement failed: All partitions should have the same number
of replicas.}}, from which I cannot recover because I cannot run {{kafka-reassign-partitions.sh}}.

Is there a way to recover from such a situation? 

  was:
PS. I classified this as a bug because I think the cluster should not be stuck in that situation,
apologies if that is wrong.

Hi,
I found myself in a situation a bit difficult to explain so I will skip the how I ended up
in this situation, but here is the problem.

Some of the brokers of my cluster are permanently gone. Consequently, I had some partitions
that now had offline leaders etc so, I used the {{kafka-reassign-partitions.sh}} to rebalance
my topics and for the most part that worked ok. Where that did not work ok, was for partitions
that had leaders, rs and irs completely in the gone brokers. Those got stuck halfway through
to what now looks like

{{Topic&#58; topicA      Partition&#58; 32      Leader&#58; &#45;1      Replicas&#58;
1,6,2,7,3,8      Isr&#58; }}
(1,2,3 are legit, 6,7,8 permanently gone)

So the first catch 22, is that I cannot elect a new leader, because the leader needs to be
elected from the ISR, and I cannot recreate the ISR because the topic has no leader.

The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} because the previous
one is supposedly still in progress, and I cannot increase the number of partitions to account
for the now permanently offline partitions, because that produces the following error {{Error
while executing topic command requirement failed: All partitions should have the same number
of replicas.}}, from which I cannot recover because I cannot run {{kafka-reassign-partitions.sh}}.

Is there a way to recover from such a situation? 


> Catch 22 with cluster rebalancing
> ---------------------------------
>
>                 Key: KAFKA-6442
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6442
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>            Reporter: Andreas
>
> PS. I classified this as a bug because I think the cluster should not be stuck in that
situation, apologies if that is wrong.
> Hi,
> I found myself in a situation a bit difficult to explain so I will skip the how I ended
up in this situation, but here is the problem.
> Some of the brokers of my cluster are permanently gone. Consequently, I had some partitions
that now had offline leaders etc so, I used the {{kafka-reassign-partitions.sh}} to rebalance
my topics and for the most part that worked ok. Where that did not work ok, was for partitions
that had leaders, rs and irs completely in the gone brokers. Those got stuck halfway through
to what now looks like
> {{Topic&#58; topicA      Partition&#58; 32      Leader&#58; &#45;1  
   Replicas&#58; 1&#44;6&#44;2&#44;7&#44;3&#44;8      Isr&#58;
}}
> (1,2,3 are legit, 6,7,8 permanently gone)
> So the first catch 22, is that I cannot elect a new leader, because the leader needs
to be elected from the ISR, and I cannot recreate the ISR because the topic has no leader.
> The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} because the
previous one is supposedly still in progress, and I cannot increase the number of partitions
to account for the now permanently offline partitions, because that produces the following
error {{Error while executing topic command requirement failed: All partitions should have
the same number of replicas.}}, from which I cannot recover because I cannot run {{kafka-reassign-partitions.sh}}.
> Is there a way to recover from such a situation? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message