jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: Node corruption - simple testcase
Date Mon, 20 Aug 2007 14:20:07 GMT
hi shaun,

On 8/20/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> Hi all,
> After much investigation we have a simple reproducible testcase to
> illustrate an example of corruption using only JackRabbit command line and
> JackRabbit types. Full test case given below to illustrate the problem.
>
> Before getting into the detail, is there anything platform specific about
> UUID values?

no

> Our Redhat platform has a real problem with the UUID
> "a55f3f6b-a909-4e8d-b65a-93002ced0920"?
>
> We know:
> * It appears to be related to the "importxml" process and certain UUID
> values.
> * The problem only occurs on our Redhat Enterprise Linux 4 platform, not
> windows
> * We're using Java 1.5.0_12, MySql 5.0.45
> * Its not related to command line API as the same occurs when run
> programmatically from our application
> * The problem originally occurred with one of our bespoke types but changing
> them to nt:folder still causes the problem. See attached.

hmm...

here's an excerpt from your xml file:

<sv:node sv:name="1">
  <sv:property sv:name="jcr:primaryType" sv:type="Name">
    <sv:value>nt:folder</sv:value>
  </sv:property>
  <sv:property sv:name="jcr:uuid" sv:type="String">
    <sv:value>a55f3f6b-a909-4e8d-b65a-93002ced0920</sv:value>
  </sv:property>

nt:folder nodes can't possibly have a jcr:uuid property. did you
perhaps manipulate the built-in types?

however, i ran a quick test (using the api), performing the steps
that you described, on windows and on mac os-x without any problems.

i don't believe that the problem's platform dependent. anyway,
since i don't have a redhat linux available i can't test it.

are you sure both repositories have identical configurations (incl. types)?
did you check the log file? set the log level to debug or lower and re-run
the test case.

cheers
stefan

>
> ### We create an empty test workspace
>
> [not connected] > startjackrabbit /var/acme/repository.xml
> /var/acme/repository
> [not logged in] > login
> [/] > createworkspace mmp659_test
> [/] > logout
>
> ### We log into the workspace and import the attached file
> [not logged in] > login admin admin -workspace mmp659_test
> [/] > importxml /tmp/Security_nt.xml
> [/] > save
>
> ### We prove that folder "1" has been created
> [/] > cd Security
> [/Security] > cd AclObjectIdentities
> [/Security/AclObjectIdentities] > ls
> name                           type            node      new       modified
> ------------------------------ --------------- --------- --------- ---------
> 1                              nt:folder       true      false     false
> jcr:created                    Date            false     false     false
> jcr:primaryType                Name            false     false     false
>
> total
>
> elapsed time: 2 ms.
>
> [/Security/AclObjectIdentities] >
> [/Security/AclObjectIdentities] > exit
>
> ### We restart the repository and show that folder "1" was not successfully
> created and AclObjectIdentities is now corrupt
>
> [not connected] > startjackrabbit /var/acme/repository.xml
> /var/acme/repository
> [not logged in] > login admin admin -workspace mmp659_test
> [/] > cd Security
> [/Security] > ls
> name                           type            node      new       modified
> ------------------------------ --------------- --------- --------- ---------
> AclObjectIdentities            nt:folder       true      false     false
> jcr:created                    Date            false     false     false
> jcr:primaryType                Name            false     false     false
>
> total
>
> [/Security] > cd AclObjectIdentities
> [/Security/AclObjectIdentities] > ls
> name                           type            node      new       modified
> ------------------------------ --------------- --------- --------- ---------
> jcr:created                    Date            false     false     false
> jcr:primaryType                Name            false     false     false
>
> total
>
> [/Security/AclObjectIdentities] > cd ..
>
> display stack trace? [y/n]n
> [/Security] > removeitem AclObjectIdentities
>
> an exception occurred
>
> exception: javax.jcr.ItemNotFoundException
> message: a55f3f6b-a909-4e8d-b65a-93002ced0920
>
> Regards,
> Shaun
>
>
> -----Original Message-----
> From: sbarriba [mailto:sbarriba@yahoo.co.uk]
> Sent: 17 August 2007 15:15
> To: users@jackrabbit.apache.org
> Subject: RE: Node corruption using Jackrabbit 1.3.1?
>
> Hi Stefan et al,
> We no longer think it's a 1.3.1 issue as we've run the consistency checker
> on other environments which were managed using 1.3 and found a couple of
> consistency problems reported.
>
> We've got a 'semi-reproducable' test case which causes corruption.
> Test case:
>  - no clustering disabled
>  - a single model 2 JackRabbit resource
>  - one node running only
>
> On start-up the application is automatically seeding a particular workspace
> by importing an existing sysview XML file.
> The import works up until the point we get a "invalidated item" message
> (shown below).
>
> ......
> 17 Aug 2007 14:41:42,927 DEBUG org.apache.jackrabbit.core.ItemManager  -
> created item
> f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
> tmodel}objectIdentity
> 17 Aug 2007 14:41:42,928 DEBUG org.apache.jackrabbit.core.ItemManager  -
> caching item
> f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
> tmodel}objectIdentity
> 17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager  -
> invalidated item a55f3f6b-a909-4e8d-b65a-93002ced0920
> 17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager  -
> removing item a55f3f6b-a909-4e8d-b65a-93002ced0920 from cache
> 17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemImpl  -
> /home/miqsys/Security/AclObjectIdentities/1: unable to update item.
> 17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemManager  -
> invalidated item
> f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
> tmodel}objectIdentity
> .....
>
> Any nodes attempting to read the offending node gets:
> 17 Aug 2007 14:41:43,559 DEBUG
> org.apache.jackrabbit.core.HierarchyManagerImpl  - failed to resolve name of
> a55f3f6b-a909-4e8d-b65a-93002ced0920
>
> Any nodes attempting to update the offending node gets:
>
> unable to update item.: a55f3f6b-a909-4e8d-b65a-93002ced0920:
> a55f3f6b-a909-4e8d-b65a-93002ced0920
>         at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1222)
>
> Any clues on what causes the "invalidated item" message?
>
> Regards,
> Shaun.
>
>
>
>
>
> -----Original Message-----
> From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com]
> Sent: 17 August 2007 14:04
> To: users@jackrabbit.apache.org
> Subject: Re: Node corruption using Jackrabbit 1.3.1?
>
> hi shaun,
>
> are you sure that this is a 1.3.1 specific issue?
>
> i remember an earlier post were you described the same problem,
> but apparently you weren't using 1.3.1:
> http://www.nabble.com/Strange-%22ignoring-nonexistent-item%22-and-removeitem
> -fails-tf4169086.html
>
> On 8/17/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> > Hi Stefan et al,
> > Further update on this, plus some answers to your questions.
> >
> > The consistency check and fix logic in JackRabbit 1.3.1 solved all but 1
> of
> > the issues. However although the log reports the remaining issue has been
> > fixed each time, this message appears after repeated restarts :(
> >
> > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
> -
> > acme: checked 1000/0 bundles...
> > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
> -
> > NodeState fe75116c-5617-423b-8c9a-4a964b667f20 references unexistent child
> > {http://www.acme.co.uk/xmlns/contentmodel}components with id
> > d3c09b52-d3be-4d3c-8807-b7827d337973
> > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
> -
> > acme: checked 2000/0 bundles...
> > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
> -
> > acme: Fixing 1 inconsistent bundle(s)...
> > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
> -
> > acme: Fixing bundle fe75116c-5617-423b-8c9a-4a964b667f20
> > org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
> -
> > acme: checked 2505/0 bundles.
> >
> > Is the consistency checker the only way to fix up these problems, or is
> > there any way we can 'open the hood' to investigate further?
>
> only by getting your hands really dirty and by delving deep into the code...
>
> >
> > Stefan wrote:
> > "did you notice anything peculiar about the corrupt nodes? is there a
> chance
> > to reconstruct the steps that lead to this state?"
> >
> > What tools what you recommend using to review the corrupt nodes? We only
> > currently use the command contrib. project.
> >
> > Reproducing this scenario is proving really difficult. The original
> > corruption occurred when a user was creating a particularly complex node
> > object which included the creation, deletion and re-ordering of various
> > same-name-siblings. After multi-hours of attempts we are yet to reproduce
> > the event. Frustrating, but we know its occurred at least twice.
> >
> > "furthermore, could you please share some details about your
> > config/deployment?"
> >
> > Sure.
> >  - JackRabbit 1.3.1
> >         - MySql Bundle Persistence Manager
> >         - Clustered across 2 nodes - only 1 node is read-write, the other
> is
> > read-only to the repos
> >       - Spring used to provide a JackRabbit JCRSessionInHttpSession
> pattern
> > for the editors who are using a web-based UI.
>
> i am not familiar with this. how is the repository instance
> accessed/created?
> can you rule out the possibility that a 3rd r/w non-cluster aware instance
> is
> created?
>
> >  - MySql 5.0.45
> >  - Tomcat 5.0.30
> >  - Sun JDK 1.5
> >  - Redhat Enterprise Linux
> >
> > All suggestions welcome.
>
> hmm, just a few random guesses....
>
> could be
> - a bundle db pm-related issue
> - a clustering- or clustering-config related issue
> - an issue caused by multiple r/w jackrabbit instances
>   accessing the same db
> - a jr core issue
>
> since this is a rather sophisticated setup it's not gonna be easy to
> investigate.
> however, we'd definitely need more information about the operations that
> lead
> to the corrupt state.
>
> btw: please feel free to create a jira issue.
>
> cheers
> stefan
>
> > Regards,
> > Shaun
> >
> >
> > -----Original Message-----
> > From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com]
> > Sent: 17 August 2007 11:26
> > To: users@jackrabbit.apache.org
> > Subject: Re: Node corruption using Jackrabbit 1.3.1?
> >
> > hi shaun,
> >
> > On 8/16/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> > > Hi all,
> > >
> > > We upgraded to JackRabbit 1.3.1 a few days ago.
> > >
> > > We have since seen a couple of occasions where we've been able to get
> the
> > > repository in an indeterminate state. The following output shows the
> state
> > > of a node which has an ordered child node property called
> acme:components
> > > e.g.
> > >
> > >
> > >
> > > [miq:FooBar] > nt:base
> > >
> > >                orderable
> > >
> > >                + acme:components (acme:Component) multiple COPY
> > >
> > >
> > >
> > > We have an instance of FooBar where acme:components[5] has disappeared??
> > >
> > > e.g.
> > >
> > >
> > >
> > > name                           type            node      new
> > modified
> > >
> > > ------------------------------ --------------- --------- ---------
> > ---------
> > >
> > > acme:components                 acme:Section     true      false
> false
> > >
> > > acme:components[2]              acme:Text        true      false
> false
> > >
> > > acme:components[3]              acme:Text        true      false
> false
> > >
> > > acme:components[4]              acme:Text        true      false
> false
> > >
> > > acme:components[6]              acme:Section     true      false
> false
> > >
> > > acme:components[7]              acme:Section     true      false
> false
> > >
> > > jcr:created                    Date            false     false     false
> > >
> > > jcr:primaryType                Name            false     false     false
> > >
> > > jcr:uuid                       String          false     false     false
> > >
> > >
> > >
> > > I presume this could happen if the deletion of the child node succeeded
> by
> > > the saving of the parent FooBar node failed for some reason?
> >
> > that should be possible since the changelog of a save operation is stored
> > atomically. if an error occurs during processing of the change log all
> > previous changes are rolled back.
> >
> > >
> > >
> > >
> > > Surely this is a state that should never happen?
> >
> > absolutely, and the problem you're describing is very alarming indeed!
> >
> > did you notice anything peculiar about the corrupt nodes? is there a
> chance
> > to reconstruct the steps that lead to this state?
> >
> > furthermore, could you please share some details about your
> > config/deployment?
> >
> > the only possible explanation i can currently come up with is
> > that there are multiple jackrabbit instances accessing the same
> > database...
> >
> > cheers
> > stefan
> >
> >
> > >
> > >
> > >
> > > Regards,
> > >
> > > Shaun
> > >
> > >
> > >
> > >
> >
> >
>
>

Mime
View raw message