zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Jaton <benjamin.ja...@gmail.com>
Subject Re: CRC check failed
Date Sat, 15 Aug 2015 05:27:09 GMT
I got to the bottom of it, this CRC error happens when the preAllocSize of
the transaction logs is smaller than the size of the ZK nodes saved.

I tried with preAllocSize=8mb , saving a couple nodes of 15mb is enough to
reproduce the error.

On Friday, August 14, 2015, Benjamin Jaton <benjamin.jaton@gmail.com> wrote:

> OK, so those transactions are probably noise then.
>
> I changed the LogFormatter to not stop when it detects CRC errors so that
> I could see what nodes were in error.
> It turns out that those errors appear on larger nodes (>5MB).
> We do override jute.maxbuffer to more than the default 1MB.
>
> Could there be some unexpected behavior with larger nodes?
>
> On Fri, Aug 14, 2015 at 11:40 AM, Flavio Junqueira <
> fpjunqueira@yahoo.com.invalid
> <javascript:_e(%7B%7D,'cvml','fpjunqueira@yahoo.com.invalid');>> wrote:
>
>> Note that the zxid is different in each of the lines.
>>
>> -Flavio
>>
>> > On 14 Aug 2015, at 19:27, Benjamin Jaton <benjamin.jaton@gmail.com
>> <javascript:_e(%7B%7D,'cvml','benjamin.jaton@gmail.com');>> wrote:
>> >
>> > I see a lot of those:
>> >
>> > *8/11/15 11:57:57 AM PDT session 0x14efe8c1ba2000b cxid 0x1ba004 zxid
>> > 0x78b5a setData '/path/to/node,(....),60221*
>> > immediately followed by
>> > *8/11/15 11:57:57 AM PDT session 0x14efe8c1ba2000b cxid 0x1ba00a zxid
>> > 0x78b5b error -110*
>> >
>> > Is -110 a NodeExists exception? A setData should not be in error if the
>> > node already exist no?
>> >
>> >
>> > On Fri, Aug 14, 2015 at 9:50 AM, Benjamin Jaton <
>> benjamin.jaton@gmail.com
>> <javascript:_e(%7B%7D,'cvml','benjamin.jaton@gmail.com');>>
>> > wrote:
>> >
>> >> This is what I have when I run the LogFormatter on the last tlog:
>> >>
>> >> *Exception in thread "main" java.io.IOException: CRC doesn't match
>> >> 1863394799 vs 480060806*
>> >>
>> >> Do you have any idea on what could cause this to happen?
>> >>
>> >> On Thu, Aug 13, 2015 at 4:16 PM, Flavio Junqueira <fpj@apache.org
>> <javascript:_e(%7B%7D,'cvml','fpj@apache.org');>> wrote:
>> >>
>> >>> Hi Benjamin,
>> >>>
>> >>> Have you tried LogFormatter to check the file? I guess it won't work
>> if
>> >>> your file is really corrupt, but it might be worth a shot.
>> Unfortunately,
>> >>> I'm not aware of a way around it other than deleting the file and and
>> >>> losing at least part of the transactions in the log.
>> >>>
>> >>> -Flavio
>> >>>
>> >>>> On 13 Aug 2015, at 17:27, Benjamin Jaton <benjamin.jaton@gmail.com
>> <javascript:_e(%7B%7D,'cvml','benjamin.jaton@gmail.com');>>
>> >>> wrote:
>> >>>>
>> >>>> Hello,
>> >>>>
>> >>>> Does anybody know how those "CRC check failed" errors can occur?
>> >>>>
>> >>>> Apparently the data has been corrupted, what would be the most likely
>> >>>> scenario for this to happen?
>> >>>>
>> >>>> java.io.IOException: CRC check failed
>> >>>> at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:612)
>> >>>>
>> >>>> at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
>> >>>>
>> >>>> at
>> >>>
>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>> >>>> at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272)
>> >>>>
>> >>>> at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399)
>> >>>>
>> >>>> at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
>> >>>>
>> >>>> at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113)
>> >>>>
>> >>>> at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
>> >>>>
>> >>>> at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
>> >>>>
>> >>>>
>> >>>> As a result my ZK can't be started, is it recoverable?
>> >>>>
>> >>>> Thanks,
>> >>>> Ben
>> >>>>
>> >>>> ---------- Forwarded message ----------
>> >>>> From: Apache Jenkins Server <jenkins@builds.apache.org
>> <javascript:_e(%7B%7D,'cvml','jenkins@builds.apache.org');>>
>> >>>> Date: Sat, Apr 26, 2014 at 11:35 PM
>> >>>> Subject: ZooKeeper_branch33_solaris - Build # 867 - Still Failing
>> >>>> To: dev@zookeeper.apache.org
>> <javascript:_e(%7B%7D,'cvml','dev@zookeeper.apache.org');>
>> >>>>
>> >>>>
>> >>>> See https://builds.apache.org/job/ZooKeeper_branch33_solaris/867/
>> >>>>
>> >>>>
>> >>>
>> ###################################################################################
>> >>>> ########################## LAST 60 LINES OF THE CONSOLE
>> >>>> ###########################
>> >>>> [...truncated 301 lines...]
>> >>>>   [junit] java.io.IOException: CRC check failed
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:611)
>> >>>>   [junit]     at
>> >>>> org.apache.zookeeper.server.CRCTest.testChecksums(CRCTest.java:165)
>> >>>>   [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> >>>> Method)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>>>   [junit]     at java.lang.reflect.Method.invoke(Method.java:597)
>> >>>>   [junit]     at junit.framework.TestCase.runTest(TestCase.java:168)
>> >>>>   [junit]     at junit.framework.TestCase.runBare(TestCase.java:134)
>> >>>>   [junit]     at
>> >>> junit.framework.TestResult$1.protect(TestResult.java:110)
>> >>>>   [junit]     at
>> >>>> junit.framework.TestResult.runProtected(TestResult.java:128)
>> >>>>   [junit]     at junit.framework.TestResult.run(TestResult.java:113)
>> >>>>   [junit]     at junit.framework.TestCase.run(TestCase.java:124)
>> >>>>   [junit]     at
>> junit.framework.TestSuite.runTest(TestSuite.java:232)
>> >>>>   [junit]     at junit.framework.TestSuite.run(TestSuite.java:227)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
>> >>>>   [junit]     at
>> >>>> junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)
>> >>>>   [junit] 2014-04-27 06:35:01,089 - INFO  [main:CRCTest@68] -
>> FINISHED
>> >>>> testChecksums
>> >>>>   [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 54.234
>> >>> sec
>> >>>>   [junit] java.io.FileNotFoundException:
>> >>>>
>> >>>
>> /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/junitvmwatcher1383325470.properties
>> >>>> (No such file or directory)
>> >>>>   [junit]     at java.io.FileInputStream.open(Native Method)
>> >>>>   [junit]     at
>> >>> java.io.FileInputStream.<init>(FileInputStream.java:120)
>> >>>>   [junit]     at java.io.FileReader.<init>(FileReader.java:55)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1028)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:817)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeOrQueue(JUnitTask.java:1657)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:764)
>> >>>>   [junit]     at
>> >>>> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
>> >>>>   [junit]     at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown
>> >>>> Source)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>>>   [junit]     at java.lang.reflect.Method.invoke(Method.java:597)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
>> >>>>   [junit]     at org.apache.tools.ant.Task.perform(Task.java:348)
>> >>>>   [junit]     at org.apache.tools.ant.Target.execute(Target.java:357)
>> >>>>   [junit]     at
>> >>> org.apache.tools.ant.Target.performTasks(Target.java:385)
>> >>>>   [junit]     at
>> >>>> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
>> >>>>   [junit]     at
>> >>>> org.apache.tools.ant.Project.executeTarget(Project.java:1298)
>> >>>>   [junit]     at
>> >>>>
>> >>>
>> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>> >>>>   [junit]     at
>> >>>> org.apache.tools.ant.Project.executeTargets(Project.java:1181)
>> >>>>   [junit]     at org.apache.tools.ant.Main.runBuild(Main.java:698)
>> >>>>   [junit]     at org.apache.tools.ant.Main.startAnt(Main.java:199)
>> >>>>   [junit]     at
>> >>>> org.apache.tools.ant.launch.Launcher.run(Launcher.java:257)
>> >>>>   [junit]     at
>> >>>> org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)
>> >>>>   [junit] Running org.apache.zookeeper.server.DataTreeUnitTest
>> >>>>   [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0
sec
>> >>>>
>> >>>> BUILD FAILED
>> >>>>
>> >>>
>> /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build.xml:814:
>> >>>> Process fork failed.
>> >>>>
>> >>>> Total time: 2 minutes 54 seconds
>> >>>> Build step 'Invoke Ant' marked build as failure
>> >>>> [locks-and-latches] Releasing all the locks
>> >>>> [locks-and-latches] All the locks released
>> >>>> Recording test results
>> >>>> Email was triggered for: Failure
>> >>>> Sending email for trigger: Failure
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> ###################################################################################
>> >>>> ############################## FAILED TESTS (if any)
>> >>>> ##############################
>> >>>> 1 tests failed.
>> >>>> FAILED:  org.apache.zookeeper.server.DataTreeUnitTest.unknown
>> >>>>
>> >>>> Error Message:
>> >>>> Forked Java VM exited abnormally. Please note the time in the report
>> >>> does
>> >>>> not reflect the time until the VM exit.
>> >>>>
>> >>>> Stack Trace:
>> >>>> junit.framework.AssertionFailedError: Forked Java VM exited
>> abnormally.
>> >>>> Please note the time in the report does not reflect the time until
>> the
>> >>> VM
>> >>>> exit.
>> >>>
>> >>>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message