zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@apache.org>
Subject Re: CRC check failed
Date Sat, 15 Aug 2015 09:15:40 GMT
Good catch! It sounds like there is a bug when the data in the writing of the txn log. I wonder
if this happens only when the znodes being written or if it hits when there isn't enough space
in the pre-allocated space. Do you want to create a jira for us to fix it?

-Flavio 


> On 15 Aug 2015, at 06:27, Benjamin Jaton <benjamin.jaton@gmail.com> wrote:
> 
> I got to the bottom of it, this CRC error happens when the preAllocSize of
> the transaction logs is smaller than the size of the ZK nodes saved.
> 
> I tried with preAllocSize=8mb , saving a couple nodes of 15mb is enough to
> reproduce the error.
> 
> On Friday, August 14, 2015, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>>
wrote:
> 
>> OK, so those transactions are probably noise then.
>> 
>> I changed the LogFormatter to not stop when it detects CRC errors so that
>> I could see what nodes were in error.
>> It turns out that those errors appear on larger nodes (>5MB).
>> We do override jute.maxbuffer to more than the default 1MB.
>> 
>> Could there be some unexpected behavior with larger nodes?
>> 
>> On Fri, Aug 14, 2015 at 11:40 AM, Flavio Junqueira <
>> fpjunqueira@yahoo.com.invalid
>> <javascript:_e(%7B%7D,'cvml','fpjunqueira@yahoo.com.invalid <mailto:fpjunqueira@yahoo.com.invalid>');>>
wrote:
>> 
>>> Note that the zxid is different in each of the lines.
>>> 
>>> -Flavio
>>> 
>>>> On 14 Aug 2015, at 19:27, Benjamin Jaton <benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>
>>> <javascript:_e(%7B%7D,'cvml','benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>');>>
wrote:
>>>> 
>>>> I see a lot of those:
>>>> 
>>>> *8/11/15 11:57:57 AM PDT session 0x14efe8c1ba2000b cxid 0x1ba004 zxid
>>>> 0x78b5a setData '/path/to/node,(....),60221*
>>>> immediately followed by
>>>> *8/11/15 11:57:57 AM PDT session 0x14efe8c1ba2000b cxid 0x1ba00a zxid
>>>> 0x78b5b error -110*
>>>> 
>>>> Is -110 a NodeExists exception? A setData should not be in error if the
>>>> node already exist no?
>>>> 
>>>> 
>>>> On Fri, Aug 14, 2015 at 9:50 AM, Benjamin Jaton <
>>> benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>
>>> <javascript:_e(%7B%7D,'cvml','benjamin.jaton@gmail.com <mailto:benjamin.jaton@gmail.com>');>>
>>>> wrote:
>>>> 
>>>>> This is what I have when I run the LogFormatter on the last tlog:
>>>>> 
>>>>> *Exception in thread "main" java.io.IOException: CRC doesn't match
>>>>> 1863394799 vs 480060806*
>>>>> 
>>>>> Do you have any idea on what could cause this to happen?
>>>>> 
>>>>> On Thu, Aug 13, 2015 at 4:16 PM, Flavio Junqueira <fpj@apache.org
>>> <javascript:_e(%7B%7D,'cvml','fpj@apache.org');>> wrote:
>>>>> 
>>>>>> Hi Benjamin,
>>>>>> 
>>>>>> Have you tried LogFormatter to check the file? I guess it won't work
>>> if
>>>>>> your file is really corrupt, but it might be worth a shot.
>>> Unfortunately,
>>>>>> I'm not aware of a way around it other than deleting the file and
and
>>>>>> losing at least part of the transactions in the log.
>>>>>> 
>>>>>> -Flavio
>>>>>> 
>>>>>>> On 13 Aug 2015, at 17:27, Benjamin Jaton <benjamin.jaton@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','benjamin.jaton@gmail.com');>>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> Does anybody know how those "CRC check failed" errors can occur?
>>>>>>> 
>>>>>>> Apparently the data has been corrupted, what would be the most
likely
>>>>>>> scenario for this to happen?
>>>>>>> 
>>>>>>> java.io.IOException: CRC check failed
>>>>>>> at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:612)
>>>>>>> 
>>>>>>> at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:157)
>>>>>>> 
>>>>>>> at
>>>>>> 
>>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>>>>>>> at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272)
>>>>>>> 
>>>>>>> at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399)
>>>>>>> 
>>>>>>> at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
>>>>>>> 
>>>>>>> at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113)
>>>>>>> 
>>>>>>> at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
>>>>>>> 
>>>>>>> at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
>>>>>>> 
>>>>>>> 
>>>>>>> As a result my ZK can't be started, is it recoverable?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Ben
>>>>>>> 
>>>>>>> ---------- Forwarded message ----------
>>>>>>> From: Apache Jenkins Server <jenkins@builds.apache.org
>>> <javascript:_e(%7B%7D,'cvml','jenkins@builds.apache.org');>>
>>>>>>> Date: Sat, Apr 26, 2014 at 11:35 PM
>>>>>>> Subject: ZooKeeper_branch33_solaris - Build # 867 - Still Failing
>>>>>>> To: dev@zookeeper.apache.org
>>> <javascript:_e(%7B%7D,'cvml','dev@zookeeper.apache.org <mailto:dev@zookeeper.apache.org>');>
>>>>>>> 
>>>>>>> 
>>>>>>> See https://builds.apache.org/job/ZooKeeper_branch33_solaris/867/
>>>>>>> 
>>>>>>> 
>>>>>> 
>>> ###################################################################################
>>>>>>> ########################## LAST 60 LINES OF THE CONSOLE
>>>>>>> ###########################
>>>>>>> [...truncated 301 lines...]
>>>>>>>  [junit] java.io.IOException: CRC check failed
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:611)
>>>>>>>  [junit]     at
>>>>>>> org.apache.zookeeper.server.CRCTest.testChecksums(CRCTest.java:165)
>>>>>>>  [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>> Method)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>  [junit]     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>  [junit]     at junit.framework.TestCase.runTest(TestCase.java:168)
>>>>>>>  [junit]     at junit.framework.TestCase.runBare(TestCase.java:134)
>>>>>>>  [junit]     at
>>>>>> junit.framework.TestResult$1.protect(TestResult.java:110)
>>>>>>>  [junit]     at
>>>>>>> junit.framework.TestResult.runProtected(TestResult.java:128)
>>>>>>>  [junit]     at junit.framework.TestResult.run(TestResult.java:113)
>>>>>>>  [junit]     at junit.framework.TestCase.run(TestCase.java:124)
>>>>>>>  [junit]     at
>>> junit.framework.TestSuite.runTest(TestSuite.java:232)
>>>>>>>  [junit]     at junit.framework.TestSuite.run(TestSuite.java:227)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
>>>>>>>  [junit]     at
>>>>>>> junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)
>>>>>>>  [junit] 2014-04-27 06:35:01,089 - INFO  [main:CRCTest@68] -
>>> FINISHED
>>>>>>> testChecksums
>>>>>>>  [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed:
54.234
>>>>>> sec
>>>>>>>  [junit] java.io.FileNotFoundException:
>>>>>>> 
>>>>>> 
>>> /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/junitvmwatcher1383325470.properties
>>>>>>> (No such file or directory)
>>>>>>>  [junit]     at java.io.FileInputStream.open(Native Method)
>>>>>>>  [junit]     at
>>>>>> java.io.FileInputStream.<init>(FileInputStream.java:120)
>>>>>>>  [junit]     at java.io.FileReader.<init>(FileReader.java:55)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1028)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:817)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeOrQueue(JUnitTask.java:1657)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:764)
>>>>>>>  [junit]     at
>>>>>>> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
>>>>>>>  [junit]     at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown
>>>>>>> Source)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>  [junit]     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
>>>>>>>  [junit]     at org.apache.tools.ant.Task.perform(Task.java:348)
>>>>>>>  [junit]     at org.apache.tools.ant.Target.execute(Target.java:357)
>>>>>>>  [junit]     at
>>>>>> org.apache.tools.ant.Target.performTasks(Target.java:385)
>>>>>>>  [junit]     at
>>>>>>> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
>>>>>>>  [junit]     at
>>>>>>> org.apache.tools.ant.Project.executeTarget(Project.java:1298)
>>>>>>>  [junit]     at
>>>>>>> 
>>>>>> 
>>> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>>>>>>>  [junit]     at
>>>>>>> org.apache.tools.ant.Project.executeTargets(Project.java:1181)
>>>>>>>  [junit]     at org.apache.tools.ant.Main.runBuild(Main.java:698)
>>>>>>>  [junit]     at org.apache.tools.ant.Main.startAnt(Main.java:199)
>>>>>>>  [junit]     at
>>>>>>> org.apache.tools.ant.launch.Launcher.run(Launcher.java:257)
>>>>>>>  [junit]     at
>>>>>>> org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)
>>>>>>>  [junit] Running org.apache.zookeeper.server.DataTreeUnitTest
>>>>>>>  [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed:
0 sec
>>>>>>> 
>>>>>>> BUILD FAILED
>>>>>>> 
>>>>>> 
>>> /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/ZooKeeper_branch33_solaris/trunk/build.xml:814:
>>>>>>> Process fork failed.
>>>>>>> 
>>>>>>> Total time: 2 minutes 54 seconds
>>>>>>> Build step 'Invoke Ant' marked build as failure
>>>>>>> [locks-and-latches] Releasing all the locks
>>>>>>> [locks-and-latches] All the locks released
>>>>>>> Recording test results
>>>>>>> Email was triggered for: Failure
>>>>>>> Sending email for trigger: Failure
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>> ###################################################################################
>>>>>>> ############################## FAILED TESTS (if any)
>>>>>>> ##############################
>>>>>>> 1 tests failed.
>>>>>>> FAILED:  org.apache.zookeeper.server.DataTreeUnitTest.unknown
>>>>>>> 
>>>>>>> Error Message:
>>>>>>> Forked Java VM exited abnormally. Please note the time in the
report
>>>>>> does
>>>>>>> not reflect the time until the VM exit.
>>>>>>> 
>>>>>>> Stack Trace:
>>>>>>> junit.framework.AssertionFailedError: Forked Java VM exited
>>> abnormally.
>>>>>>> Please note the time in the report does not reflect the time
until
>>> the
>>>>>> VM
>>>>>>> exit.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message