kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apurva Mehta (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-5357) StackOverFlow error in transaction coordinator
Date Thu, 01 Jun 2017 06:57:04 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032547#comment-16032547
] 

Apurva Mehta edited comment on KAFKA-5357 at 6/1/17 6:56 AM:
-------------------------------------------------------------

The stack over flow error is essentially due to the fact that we are in a tight recursive
loop when handling transaction marker write completion. Here is what happens: 

# We send the marker
# Upon success, we try to write the updated transaction metadata to the transaction log to
move it form PrepareXX to CompletedXX state.
# If this append fails, we do a recursive call to `addTransactionToLog` with the same metadata
update, ad infinitum.
# When the there are broker bounces and not enough replicas are available, this can happen
in a tight loop for several 10's of seconds, resulting in a stack overflow error. 

One fix is to back of and retry rather than doing the tight loop -- that is what the client
does.



was (Author: apurva):
The stack over flow error is essentially due to the fact that we are in a tight recursive
loop when handling transaction marker write completion. Here is what happens: 

# We send the marker
# Upon success, we try to write the updated transaction metadata to the transaction log to
move it form PrepareXX to CompletedXX state.
# If this append fails, we do a recursive call for `addTransactionToLog`, ad infinitum.
# When the there are broker bounces and not enough replicas are available, this can happen
in a tight loop for several 10's of seconds, resulting in a stack overflow error. 

One fix is to back of and retry rather than doing the tight loop -- that is what the client
does.


> StackOverFlow error in transaction coordinator
> ----------------------------------------------
>
>                 Key: KAFKA-5357
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5357
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.11.0.0
>            Reporter: Apurva Mehta
>            Priority: Blocker
>              Labels: exactly-once
>             Fix For: 0.11.0.0
>
>         Attachments: KAFKA-5357.tar.gz
>
>
> I observed the following in the broker logs: 
> {noformat}
> [2017-06-01 04:10:36,664] ERROR [Replica Manager on Broker 1]: Error processing append
operation on partition __transaction_state-37 (kafka.server.ReplicaManager)
> [2017-06-01 04:10:36,667] ERROR [TxnMarkerSenderThread-1]: Error due to (kafka.common.InterBrokerSendThread)
> java.lang.StackOverflowError
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.io.PrintWriter.<init>(PrintWriter.java:116)
>         at java.io.PrintWriter.<init>(PrintWriter.java:100)
>         at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:58)
>         at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
>         at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
>         at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313)
>         at org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:369)
>         at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
>         at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>         at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>         at org.apache.log4j.Category.callAppenders(Category.java:206)
>         at org.apache.log4j.Category.forcedLog(Category.java:391)
>         at org.apache.log4j.Category.error(Category.java:322)
>         at kafka.utils.Logging$class.error(Logging.scala:105)
>         at kafka.server.ReplicaManager.error(ReplicaManager.scala:122)
>         at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:557)
>         at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:505)
>         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>         at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
>         at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>         at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:505)
>         at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:346)
>         at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply$mcV$sp(TransactionStateManager.scala:589)
>         at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply(TransactionStateManager.scala:570)
>         at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1.apply(TransactionStateManager.scala:570)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
>         at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:219)
>         at kafka.coordinator.transaction.TransactionStateManager.appendTransactionToLog(TransactionStateManager.scala:564)
>         at kafka.coordinator.transaction.TransactionMarkerChannelManager.kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1(TransactionMarkerChannelManager.scala:225)
>         at kafka.coordinator.transaction.TransactionMarkerChannelManager$$anonfun$kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1$4.apply(TransactionMarkerChannelManager.scala:225)
>         at kafka.coordinator.transaction.TransactionMarkerChannelManager$$anonfun$kafka$coordinator$transaction$TransactionMarkerChannelManager$$retryAppendCallback$1$4.apply(TransactionMarkerChannelManager.scala:225)
>         at kafka.coordinator.transaction.TransactionStateManager.kafka$coordinator$transaction$TransactionStateManager$$updateCacheCallback$1(TransactionStateManager.scala:561)
>         at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1$$anonfun$apply$mcV$sp$4.apply(TransactionStateManager.scala:595)
>         at kafka.coordinator.transaction.TransactionStateManager$$anonfun$appendTransactionToLog$1$$anonfun$apply$mcV$sp$4.apply(TransactionStateManager.scala:595)
>         at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:373)
>  {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message