jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sbarriba" <sbarr...@yahoo.co.uk>
Subject RE: Node corruption - simple testcase - UUID-to-binary issue?
Date Mon, 27 Aug 2007 18:03:40 GMT
Thanks Stefan - I'm glad someone else can reproduce it :)
Added as https://issues.apache.org/jira/browse/JCR-1092.
We're running here with the varbinary fix in place without any side effects
thus far.
Regards,
Shaun.

-----Original Message-----
From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com] 
Sent: 27 August 2007 17:31
To: users@jackrabbit.apache.org
Subject: Re: Node corruption - simple testcase - UUID-to-binary issue?

hi shaun

On 8/26/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> Testing shows this issue highlights a problem with the Bundle persistence
> manager and MySqls method of handling BINARY columns.
> The solution looks to be to replace BINARY(16) with VARBINARY(16). Quoting
> from http://dev.mysql.com/doc/refman/5.0/en/binary-varbinary.html...
> "If the value retrieved must be the same as the value specified for
storage
> with no padding, it might be preferable to use VARBINARY or one of the
BLOB
> data types instead."
> A review of our logs shows that all of the corruption we've seen has
related
> to nodes with UUIDs including 0's.

i was finally able to reproduce the issue. i tested with
mysql-connector-java-5.0.7
and mysql 4.1.13 on win2k and mysql 5.0.45 on redhat enterprise.
the issue appeared on both therefore it doesn't seem to be platform
dependent.

my guess is that it's jdbc driver/server version dependent.

trailing 0 bytes seem to cause the issue. looks like they're mistaken
for padding.

>
> * Shall I log a JIRA ticket for this?

yes, that would be great. please include all relevant information.

> * Anyone see any issues with this fix?

personally i don't like workarounds that i don't understand ;-)
i'd like to identify the root cause first. i'll further investigate...

thanks
stefan

>
>
> In the following example you can see I'm showing all bundles in the
"test1"
> workspace.
>
> mysql> select hex(node_id) from test1_bundle;
> +----------------------------------+
> | hex(node_id)                     |
> +----------------------------------+
> | 28126C3E36A0471D9CDC5AC423BAC9C5 |
> | A55F3F6BA9094E8DB65A93002CED0920 |
> | CAFEBABECAFEBABECAFEBABECAFEBABE |
> | D638EACCDEB641FD8868804C8ECEFFFD |
> | DEADBEEFCAFEBABECAFEBABECAFEBABE |
> +----------------------------------+
> 5 rows in set (0.00 sec)
>
> ...but a select using the same UUID hex value returns no rows.
>
> mysql>  select node_id from test1_bundle where
> unhex('A55F3F6BA9094E8DB65A93002CED0920') = node_id;
> Empty set (0.00 sec)
>
> I've then created a new "test3" workspace which I modified to use
varbinary
> instead of binary with:
>
> alter table test3_bundle modify NODE_ID varbinary(16);
> alter table test3_refs modify NODE_ID varbinary(16);
>
> My import test case now no longer fails and the following query proves
that
> query operations, after a store, return rows as expected.
>
> mysql>  select node_id from test3_bundle where
> unhex('A55F3F6BA9094E8DB65A93002CED0920') = node_id;
> +------------------+
> | node_id          |
> +--------¶Z ,í--  |
> +------------------+
> 1 row in set (0.00 sec)
>
> mysql> desc test3_bundle;
> ERROR 2006 (HY000): MySQL server has gone away
> No connection. Trying to reconnect...
> Connection id:    7116
> Current database: mmptest
>
> +-------------+---------------+------+-----+---------+-------+
> | Field       | Type          | Null | Key | Default | Extra |
> +-------------+---------------+------+-----+---------+-------+
> | NODE_ID     | varbinary(16) | YES  | UNI | NULL    |       |
> | BUNDLE_DATA | longblob      | NO   |     |         |       |
> +-------------+---------------+------+-----+---------+-------+
> 2 rows in set (0.00 sec)
>
>
>
>
> mysql>  alter table test3_bundle modify NODE_ID varbinary(16);
> Query OK, 2 rows affected (0.00 sec)
> Records: 2  Duplicates: 0  Warnings: 0
>
>
> -----Original Message-----
> From: sbarriba [mailto:sbarriba@yahoo.co.uk]
> Sent: 25 August 2007 18:31
> To: 'users@jackrabbit.apache.org'
> Subject: RE: Node corruption - simple testcase - UUID-to-binary issue?
>
> Hi all,
> After detailed debugging of the corruption issue I 'think' we've drilled
> down to the likely cause of the issue.
>
> In summary it looks like the binary values read back from MySql where the
> UUID contains 0's is not the same as that generated from the UUID
> getRawBytes() call. As result, you can store a node with the UUID that has
> 0's but its never found when read back. This therefore causes corruption
in
> random places when certain UUIDs are generated.
>
> Test Case:
>
> I've attached 2 files. One causes node corruption when imported, the other
> does not.
> The only difference is that I removed any 0 values from the problem UUID
in
> the file that causes corruption.
>
> As Stefan pointed out, I had manipulated the test case to use standard nt
> types when in fact I should have provided the following info (sorry
Stefan)
> e.g. the test folder types are referencable hence the jcr:uuid allocation
>
> [acme:Folder] > nt:folder, mix:referenceable
>
> If I import causes_corruption.xml and then attempt to "ls"
> AclObjectIdentities then loadBundle() returns null for the UUID
>
> a55f3f6b-a909-4e8d-b65a-93002ced0920 which in bytes is [-91, 95, 63, 107,
> -87, 9, 78, -115, -74, 90, -109, 0, 44, -19, 9, 32]
>
> If I import works.xml then "ls" works fine for the same node as I've
> manually changed the UUID to replace 0s with 1s in the last section.
>
> a55f3f6b-a909-4e8d-b65a-93112ced1921 [-91, 95, 63, 107, -87, 9, 78, -115,
> -74, 90, -109, 17, 44, -19, 25, 33]
>
> I'm assuming therefore this is potentially a MySql or JDBC driver issue?
> The problem occurs on Redhat Enterprise Linux, Mysql 5.0.45 and
> mysql-connector-java-5.0.6-bin.jar.
> I'm unable to reproduce this on our Windows machines or Fedora box.
>
> All suggestions greatly appreciated.
> Regards,
> Shaun.
>
>
> -----Original Message-----
> From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com]
> Sent: 20 August 2007 15:20
> To: users@jackrabbit.apache.org
> Subject: Re: Node corruption - simple testcase
>
> hi shaun,
>
> On 8/20/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> > Hi all,
> > After much investigation we have a simple reproducible testcase to
> > illustrate an example of corruption using only JackRabbit command line
> > and JackRabbit types. Full test case given below to illustrate the
> problem.
> >
> > Before getting into the detail, is there anything platform specific
> > about UUID values?
>
> no
>
> > Our Redhat platform has a real problem with the UUID
> > "a55f3f6b-a909-4e8d-b65a-93002ced0920"?
> >
> > We know:
> > * It appears to be related to the "importxml" process and certain UUID
> > values.
> > * The problem only occurs on our Redhat Enterprise Linux 4 platform,
> > not windows
> > * We're using Java 1.5.0_12, MySql 5.0.45
> > * Its not related to command line API as the same occurs when run
> > programmatically from our application
> > * The problem originally occurred with one of our bespoke types but
> > changing them to nt:folder still causes the problem. See attached.
>
> hmm...
>
> here's an excerpt from your xml file:
>
> <sv:node sv:name="1">
>   <sv:property sv:name="jcr:primaryType" sv:type="Name">
>     <sv:value>nt:folder</sv:value>
>   </sv:property>
>   <sv:property sv:name="jcr:uuid" sv:type="String">
>     <sv:value>a55f3f6b-a909-4e8d-b65a-93002ced0920</sv:value>
>   </sv:property>
>
> nt:folder nodes can't possibly have a jcr:uuid property. did you perhaps
> manipulate the built-in types?
>
> however, i ran a quick test (using the api), performing the steps that you
> described, on windows and on mac os-x without any problems.
>
> i don't believe that the problem's platform dependent. anyway, since i
don't
> have a redhat linux available i can't test it.
>
> are you sure both repositories have identical configurations (incl.
types)?
> did you check the log file? set the log level to debug or lower and re-run
> the test case.
>
> cheers
> stefan
>
> >
> > ### We create an empty test workspace
> >
> > [not connected] > startjackrabbit /var/acme/repository.xml
> > /var/acme/repository [not logged in] > login [/] > createworkspace
> > mmp659_test [/] > logout
> >
> > ### We log into the workspace and import the attached file [not logged
> > in] > login admin admin -workspace mmp659_test [/] > importxml
> > /tmp/Security_nt.xml [/] > save
> >
> > ### We prove that folder "1" has been created [/] > cd Security
> > [/Security] > cd AclObjectIdentities [/Security/AclObjectIdentities] >
> > ls
> > name                           type            node      new
> modified
> > ------------------------------ --------------- --------- ---------
> ---------
> > 1                              nt:folder       true      false     false
> > jcr:created                    Date            false     false     false
> > jcr:primaryType                Name            false     false     false
> >
> > total
> >
> > elapsed time: 2 ms.
> >
> > [/Security/AclObjectIdentities] >
> > [/Security/AclObjectIdentities] > exit
> >
> > ### We restart the repository and show that folder "1" was not
> > successfully created and AclObjectIdentities is now corrupt
> >
> > [not connected] > startjackrabbit /var/acme/repository.xml
> > /var/acme/repository [not logged in] > login admin admin -workspace
> > mmp659_test [/] > cd Security [/Security] > ls
> > name                           type            node      new
> modified
> > ------------------------------ --------------- --------- ---------
> ---------
> > AclObjectIdentities            nt:folder       true      false     false
> > jcr:created                    Date            false     false     false
> > jcr:primaryType                Name            false     false     false
> >
> > total
> >
> > [/Security] > cd AclObjectIdentities
> > [/Security/AclObjectIdentities] > ls
> > name                           type            node      new
> modified
> > ------------------------------ --------------- --------- ---------
> ---------
> > jcr:created                    Date            false     false     false
> > jcr:primaryType                Name            false     false     false
> >
> > total
> >
> > [/Security/AclObjectIdentities] > cd ..
> >
> > display stack trace? [y/n]n
> > [/Security] > removeitem AclObjectIdentities
> >
> > an exception occurred
> >
> > exception: javax.jcr.ItemNotFoundException
> > message: a55f3f6b-a909-4e8d-b65a-93002ced0920
> >
> > Regards,
> > Shaun
> >
> >
> > -----Original Message-----
> > From: sbarriba [mailto:sbarriba@yahoo.co.uk]
> > Sent: 17 August 2007 15:15
> > To: users@jackrabbit.apache.org
> > Subject: RE: Node corruption using Jackrabbit 1.3.1?
> >
> > Hi Stefan et al,
> > We no longer think it's a 1.3.1 issue as we've run the consistency
> > checker on other environments which were managed using 1.3 and found a
> > couple of consistency problems reported.
> >
> > We've got a 'semi-reproducable' test case which causes corruption.
> > Test case:
> >  - no clustering disabled
> >  - a single model 2 JackRabbit resource
> >  - one node running only
> >
> > On start-up the application is automatically seeding a particular
> > workspace by importing an existing sysview XML file.
> > The import works up until the point we get a "invalidated item"
> > message (shown below).
> >
> > ......
> > 17 Aug 2007 14:41:42,927 DEBUG org.apache.jackrabbit.core.ItemManager
> > - created item
> > f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/
> > conten
> > tmodel}objectIdentity
> > 17 Aug 2007 14:41:42,928 DEBUG org.apache.jackrabbit.core.ItemManager
> > - caching item
> > f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/
> > conten
> > tmodel}objectIdentity
> > 17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager
> > - invalidated item a55f3f6b-a909-4e8d-b65a-93002ced0920
> > 17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager
> > - removing item a55f3f6b-a909-4e8d-b65a-93002ced0920 from cache
> > 17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemImpl  -
> > /home/miqsys/Security/AclObjectIdentities/1: unable to update item.
> > 17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemManager
> > - invalidated item
> > f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/
> > conten
> > tmodel}objectIdentity
> > .....
> >
> > Any nodes attempting to read the offending node gets:
> > 17 Aug 2007 14:41:43,559 DEBUG
> > org.apache.jackrabbit.core.HierarchyManagerImpl  - failed to resolve
> > name of a55f3f6b-a909-4e8d-b65a-93002ced0920
> >
> > Any nodes attempting to update the offending node gets:
> >
> > unable to update item.: a55f3f6b-a909-4e8d-b65a-93002ced0920:
> > a55f3f6b-a909-4e8d-b65a-93002ced0920
> >         at
> > org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1222)
> >
> > Any clues on what causes the "invalidated item" message?
> >
> > Regards,
> > Shaun.
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com]
> > Sent: 17 August 2007 14:04
> > To: users@jackrabbit.apache.org
> > Subject: Re: Node corruption using Jackrabbit 1.3.1?
> >
> > hi shaun,
> >
> > are you sure that this is a 1.3.1 specific issue?
> >
> > i remember an earlier post were you described the same problem, but
> > apparently you weren't using 1.3.1:
> > http://www.nabble.com/Strange-%22ignoring-nonexistent-item%22-and-remo
> > veitem
> > -fails-tf4169086.html
> >
> > On 8/17/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> > > Hi Stefan et al,
> > > Further update on this, plus some answers to your questions.
> > >
> > > The consistency check and fix logic in JackRabbit 1.3.1 solved all
> > > but 1
> > of
> > > the issues. However although the log reports the remaining issue has
> > > been fixed each time, this message appears after repeated restarts
> > > :(
> > >
> > > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceMan
> > > ager
> > -
> > > acme: checked 1000/0 bundles...
> > > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceMan
> > > ager
> > -
> > > NodeState fe75116c-5617-423b-8c9a-4a964b667f20 references unexistent
> > > child {http://www.acme.co.uk/xmlns/contentmodel}components with id
> > > d3c09b52-d3be-4d3c-8807-b7827d337973
> > > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceMan
> > > ager
> > -
> > > acme: checked 2000/0 bundles...
> > > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceMan
> > > ager
> > -
> > > acme: Fixing 1 inconsistent bundle(s)...
> > > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceMan
> > > ager
> > -
> > > acme: Fixing bundle fe75116c-5617-423b-8c9a-4a964b667f20
> > > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceMan
> > > ager
> > -
> > > acme: checked 2505/0 bundles.
> > >
> > > Is the consistency checker the only way to fix up these problems, or
> > > is there any way we can 'open the hood' to investigate further?
> >
> > only by getting your hands really dirty and by delving deep into the
> code...
> >
> > >
> > > Stefan wrote:
> > > "did you notice anything peculiar about the corrupt nodes? is there
> > > a
> > chance
> > > to reconstruct the steps that lead to this state?"
> > >
> > > What tools what you recommend using to review the corrupt nodes? We
> > > only currently use the command contrib. project.
> > >
> > > Reproducing this scenario is proving really difficult. The original
> > > corruption occurred when a user was creating a particularly complex
> > > node object which included the creation, deletion and re-ordering of
> > > various same-name-siblings. After multi-hours of attempts we are yet
> > > to reproduce the event. Frustrating, but we know its occurred at least
> twice.
> > >
> > > "furthermore, could you please share some details about your
> > > config/deployment?"
> > >
> > > Sure.
> > >  - JackRabbit 1.3.1
> > >         - MySql Bundle Persistence Manager
> > >         - Clustered across 2 nodes - only 1 node is read-write, the
> > > other
> > is
> > > read-only to the repos
> > >       - Spring used to provide a JackRabbit JCRSessionInHttpSession
> > pattern
> > > for the editors who are using a web-based UI.
> >
> > i am not familiar with this. how is the repository instance
> > accessed/created?
> > can you rule out the possibility that a 3rd r/w non-cluster aware
> > instance is created?
> >
> > >  - MySql 5.0.45
> > >  - Tomcat 5.0.30
> > >  - Sun JDK 1.5
> > >  - Redhat Enterprise Linux
> > >
> > > All suggestions welcome.
> >
> > hmm, just a few random guesses....
> >
> > could be
> > - a bundle db pm-related issue
> > - a clustering- or clustering-config related issue
> > - an issue caused by multiple r/w jackrabbit instances
> >   accessing the same db
> > - a jr core issue
> >
> > since this is a rather sophisticated setup it's not gonna be easy to
> > investigate.
> > however, we'd definitely need more information about the operations
> > that lead to the corrupt state.
> >
> > btw: please feel free to create a jira issue.
> >
> > cheers
> > stefan
> >
> > > Regards,
> > > Shaun
> > >
> > >
> > > -----Original Message-----
> > > From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com]
> > > Sent: 17 August 2007 11:26
> > > To: users@jackrabbit.apache.org
> > > Subject: Re: Node corruption using Jackrabbit 1.3.1?
> > >
> > > hi shaun,
> > >
> > > On 8/16/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> > > > Hi all,
> > > >
> > > > We upgraded to JackRabbit 1.3.1 a few days ago.
> > > >
> > > > We have since seen a couple of occasions where we've been able to
> > > > get
> > the
> > > > repository in an indeterminate state. The following output shows
> > > > the
> > state
> > > > of a node which has an ordered child node property called
> > acme:components
> > > > e.g.
> > > >
> > > >
> > > >
> > > > [miq:FooBar] > nt:base
> > > >
> > > >                orderable
> > > >
> > > >                + acme:components (acme:Component) multiple COPY
> > > >
> > > >
> > > >
> > > > We have an instance of FooBar where acme:components[5] has
> disappeared??
> > > >
> > > > e.g.
> > > >
> > > >
> > > >
> > > > name                           type            node      new
> > > modified
> > > >
> > > > ------------------------------ --------------- --------- ---------
> > > ---------
> > > >
> > > > acme:components                 acme:Section     true      false
> > false
> > > >
> > > > acme:components[2]              acme:Text        true      false
> > false
> > > >
> > > > acme:components[3]              acme:Text        true      false
> > false
> > > >
> > > > acme:components[4]              acme:Text        true      false
> > false
> > > >
> > > > acme:components[6]              acme:Section     true      false
> > false
> > > >
> > > > acme:components[7]              acme:Section     true      false
> > false
> > > >
> > > > jcr:created                    Date            false     false
> false
> > > >
> > > > jcr:primaryType                Name            false     false
> false
> > > >
> > > > jcr:uuid                       String          false     false
> false
> > > >
> > > >
> > > >
> > > > I presume this could happen if the deletion of the child node
> > > > succeeded
> > by
> > > > the saving of the parent FooBar node failed for some reason?
> > >
> > > that should be possible since the changelog of a save operation is
> > > stored atomically. if an error occurs during processing of the
> > > change log all previous changes are rolled back.
> > >
> > > >
> > > >
> > > >
> > > > Surely this is a state that should never happen?
> > >
> > > absolutely, and the problem you're describing is very alarming indeed!
> > >
> > > did you notice anything peculiar about the corrupt nodes? is there a
> > chance
> > > to reconstruct the steps that lead to this state?
> > >
> > > furthermore, could you please share some details about your
> > > config/deployment?
> > >
> > > the only possible explanation i can currently come up with is that
> > > there are multiple jackrabbit instances accessing the same
> > > database...
> > >
> > > cheers
> > > stefan
> > >
> > >
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Shaun
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>


Mime
View raw message