kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenbing Shen (Jira)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-9458) Kafka crashed in windows environment
Date Fri, 18 Dec 2020 05:57:01 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251530#comment-17251530
] 

Wenbing Shen commented on KAFKA-9458:
-------------------------------------

The current patch is deficient. When topic is deleted or partition migration is carried out,
the service will still be suspended or the disk will be offline. I have provided the following
patch file, which is effective for self-test

[^kafka_windows_crash_by_delete_topic_and_Partition_migration]

> Kafka crashed in windows environment
> ------------------------------------
>
>                 Key: KAFKA-9458
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9458
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 2.4.0
>         Environment: Windows Server 2019
>            Reporter: hirik
>            Priority: Critical
>              Labels: windows
>         Attachments: Windows_crash_fix.patch, kafka_windows_crash_by_delete_topic_and_Partition_migration,
logs.zip
>
>
> Hi,
> while I was trying to validate Kafka retention policy, Kafka Server crashed with below
exception trace. 
> [2020-01-21 17:10:40,475] INFO [Log partition=test1-3, dir=C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka]
Rolled new log segment at offset 1 in 52 ms. (kafka.log.Log)
> [2020-01-21 17:10:40,484] ERROR Error while deleting segments for test1-3 in dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka
(kafka.server.LogDirFailureChannel)
> java.nio.file.FileSystemException: C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex
-> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex.deleted:
The process cannot access the file because it is being used by another process.
> at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
>  at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795)
>  at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
>  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.Log.deleteSegmentFiles(Log.scala:2206)
>  at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191)
>  at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700)
>  at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17)
>  at kafka.log.Log.maybeHandleIOException(Log.scala:2316)
>  at kafka.log.Log.deleteSegments(Log.scala:1691)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1686)
>  at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1753)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.LogManager.cleanupLogs(LogManager.scala:979)
>  at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
>  at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>  at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>  at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:830)
>  Suppressed: java.nio.file.FileSystemException: C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex
-> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex.deleted:
The process cannot access the file because it is being used by another process.
> at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309)
>  at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:792)
>  ... 27 more
> [2020-01-21 17:10:40,495] INFO [ReplicaManager broker=0] Stopping serving replicas in
dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka (kafka.server.ReplicaManager)
> [2020-01-21 17:10:40,495] ERROR Uncaught exception in scheduled task 'kafka-log-retention'
(kafka.utils.KafkaScheduler)
> org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for
test1-3 in dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka
> Caused by: java.nio.file.FileSystemException: C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex
-> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex.deleted:
The process cannot access the file because it is being used by another process.
> at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
>  at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:795)
>  at kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:209)
>  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:497)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1(Log.scala:2206)
>  at kafka.log.Log.$anonfun$deleteSegmentFiles$1$adapted(Log.scala:2206)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.Log.deleteSegmentFiles(Log.scala:2206)
>  at kafka.log.Log.removeAndDeleteSegments(Log.scala:2191)
>  at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1700)
>  at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.scala:17)
>  at kafka.log.Log.maybeHandleIOException(Log.scala:2316)
>  at kafka.log.Log.deleteSegments(Log.scala:1691)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1686)
>  at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1763)
>  at kafka.log.Log.deleteOldSegments(Log.scala:1753)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:982)
>  at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:979)
>  at scala.collection.immutable.List.foreach(List.scala:305)
>  at kafka.log.LogManager.cleanupLogs(LogManager.scala:979)
>  at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:403)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
>  at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>  at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>  at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:830)
>  Suppressed: java.nio.file.FileSystemException: C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex
-> C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka\test1-3\00000000000000000000.timeindex.deleted:
The process cannot access the file because it is being used by another process.
> at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
>  at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
>  at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309)
>  at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
>  at java.base/java.nio.file.Files.move(Files.java:1425)
>  at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:792)
>  ... 27 more
> [2020-01-21 17:10:40,505] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for
partitions HashSet(test1-3, test1-7, test-0, test1-0, test1-1, test1-5, test1-2, test1-8,
test1-4, test1-9, test1-6) (kafka.server.ReplicaFetcherManager)
> [2020-01-21 17:10:40,507] INFO [ReplicaAlterLogDirsManager on broker 0] Removed fetcher
for partitions HashSet(test1-3, test1-7, test-0, test1-0, test1-1, test1-5, test1-2, test1-8,
test1-4, test1-9, test1-6) (kafka.server.ReplicaAlterLogDirsManager)
> [2020-01-21 17:10:40,522] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for
partitions test1-3,test1-7,test-0,test1-0,test1-1,test1-5,test1-2,test1-8,test1-4,test1-9,test1-6
and stopped moving logs for partitions because they are in the failed log directory C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka.
(kafka.server.ReplicaManager)
> [2020-01-21 17:10:40,523] INFO Stopping serving logs in dir C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka
(kafka.log.LogManager)
> [2020-01-21 17:10:40,526] ERROR Shutdown broker because all log dirs in C:\Users\Administrator\Downloads\kafka\bin\windows\..\..\data\kafka
have failed (kafka.log.LogManager)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message