jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sbarriba" <sbarr...@yahoo.co.uk>
Subject RE: Node corruption - simple testcase
Date Mon, 20 Aug 2007 09:23:13 GMT
Hi all,
After much investigation we have a simple reproducible testcase to
illustrate an example of corruption using only JackRabbit command line and
JackRabbit types. Full test case given below to illustrate the problem.

Before getting into the detail, is there anything platform specific about
UUID values?
Our Redhat platform has a real problem with the UUID
"a55f3f6b-a909-4e8d-b65a-93002ced0920"? 

We know:
* It appears to be related to the "importxml" process and certain UUID
values.
* The problem only occurs on our Redhat Enterprise Linux 4 platform, not
windows
* We're using Java 1.5.0_12, MySql 5.0.45
* Its not related to command line API as the same occurs when run
programmatically from our application
* The problem originally occurred with one of our bespoke types but changing
them to nt:folder still causes the problem. See attached.

### We create an empty test workspace

[not connected] > startjackrabbit /var/acme/repository.xml
/var/acme/repository
[not logged in] > login
[/] > createworkspace mmp659_test
[/] > logout

### We log into the workspace and import the attached file
[not logged in] > login admin admin -workspace mmp659_test
[/] > importxml /tmp/Security_nt.xml
[/] > save

### We prove that folder "1" has been created
[/] > cd Security
[/Security] > cd AclObjectIdentities
[/Security/AclObjectIdentities] > ls
name                           type            node      new       modified
------------------------------ --------------- --------- --------- ---------
1                              nt:folder       true      false     false
jcr:created                    Date            false     false     false
jcr:primaryType                Name            false     false     false

total

elapsed time: 2 ms.

[/Security/AclObjectIdentities] >
[/Security/AclObjectIdentities] > exit

### We restart the repository and show that folder "1" was not successfully
created and AclObjectIdentities is now corrupt

[not connected] > startjackrabbit /var/acme/repository.xml
/var/acme/repository
[not logged in] > login admin admin -workspace mmp659_test
[/] > cd Security
[/Security] > ls
name                           type            node      new       modified
------------------------------ --------------- --------- --------- ---------
AclObjectIdentities            nt:folder       true      false     false
jcr:created                    Date            false     false     false
jcr:primaryType                Name            false     false     false

total

[/Security] > cd AclObjectIdentities
[/Security/AclObjectIdentities] > ls
name                           type            node      new       modified
------------------------------ --------------- --------- --------- ---------
jcr:created                    Date            false     false     false
jcr:primaryType                Name            false     false     false

total

[/Security/AclObjectIdentities] > cd ..

display stack trace? [y/n]n
[/Security] > removeitem AclObjectIdentities

an exception occurred

exception: javax.jcr.ItemNotFoundException
message: a55f3f6b-a909-4e8d-b65a-93002ced0920

Regards,
Shaun


-----Original Message-----
From: sbarriba [mailto:sbarriba@yahoo.co.uk] 
Sent: 17 August 2007 15:15
To: users@jackrabbit.apache.org
Subject: RE: Node corruption using Jackrabbit 1.3.1?

Hi Stefan et al,
We no longer think it's a 1.3.1 issue as we've run the consistency checker
on other environments which were managed using 1.3 and found a couple of
consistency problems reported.

We've got a 'semi-reproducable' test case which causes corruption.
Test case:
 - no clustering disabled
 - a single model 2 JackRabbit resource
 - one node running only

On start-up the application is automatically seeding a particular workspace
by importing an existing sysview XML file.
The import works up until the point we get a "invalidated item" message
(shown below).

......
17 Aug 2007 14:41:42,927 DEBUG org.apache.jackrabbit.core.ItemManager  -
created item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
17 Aug 2007 14:41:42,928 DEBUG org.apache.jackrabbit.core.ItemManager  -
caching item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager  -
invalidated item a55f3f6b-a909-4e8d-b65a-93002ced0920
17 Aug 2007 14:41:43,195 DEBUG org.apache.jackrabbit.core.ItemManager  -
removing item a55f3f6b-a909-4e8d-b65a-93002ced0920 from cache
17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemImpl  -
/home/miqsys/Security/AclObjectIdentities/1: unable to update item.
17 Aug 2007 14:41:43,196 DEBUG org.apache.jackrabbit.core.ItemManager  -
invalidated item
f201a800-05ae-4cbd-bdb8-7d4f6c219403/{http://www.mobileiq.co.uk/xmlns/conten
tmodel}objectIdentity
.....

Any nodes attempting to read the offending node gets:
17 Aug 2007 14:41:43,559 DEBUG
org.apache.jackrabbit.core.HierarchyManagerImpl  - failed to resolve name of
a55f3f6b-a909-4e8d-b65a-93002ced0920

Any nodes attempting to update the offending node gets:

unable to update item.: a55f3f6b-a909-4e8d-b65a-93002ced0920:
a55f3f6b-a909-4e8d-b65a-93002ced0920
        at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1222)

Any clues on what causes the "invalidated item" message?

Regards,
Shaun.





-----Original Message-----
From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com] 
Sent: 17 August 2007 14:04
To: users@jackrabbit.apache.org
Subject: Re: Node corruption using Jackrabbit 1.3.1?

hi shaun,

are you sure that this is a 1.3.1 specific issue?

i remember an earlier post were you described the same problem,
but apparently you weren't using 1.3.1:
http://www.nabble.com/Strange-%22ignoring-nonexistent-item%22-and-removeitem
-fails-tf4169086.html

On 8/17/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> Hi Stefan et al,
> Further update on this, plus some answers to your questions.
>
> The consistency check and fix logic in JackRabbit 1.3.1 solved all but 1
of
> the issues. However although the log reports the remaining issue has been
> fixed each time, this message appears after repeated restarts :(
>
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 1000/0 bundles...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> NodeState fe75116c-5617-423b-8c9a-4a964b667f20 references unexistent child
> {http://www.acme.co.uk/xmlns/contentmodel}components with id
> d3c09b52-d3be-4d3c-8807-b7827d337973
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 2000/0 bundles...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: Fixing 1 inconsistent bundle(s)...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: Fixing bundle fe75116c-5617-423b-8c9a-4a964b667f20
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager
-
> acme: checked 2505/0 bundles.
>
> Is the consistency checker the only way to fix up these problems, or is
> there any way we can 'open the hood' to investigate further?

only by getting your hands really dirty and by delving deep into the code...

>
> Stefan wrote:
> "did you notice anything peculiar about the corrupt nodes? is there a
chance
> to reconstruct the steps that lead to this state?"
>
> What tools what you recommend using to review the corrupt nodes? We only
> currently use the command contrib. project.
>
> Reproducing this scenario is proving really difficult. The original
> corruption occurred when a user was creating a particularly complex node
> object which included the creation, deletion and re-ordering of various
> same-name-siblings. After multi-hours of attempts we are yet to reproduce
> the event. Frustrating, but we know its occurred at least twice.
>
> "furthermore, could you please share some details about your
> config/deployment?"
>
> Sure.
>  - JackRabbit 1.3.1
>         - MySql Bundle Persistence Manager
>         - Clustered across 2 nodes - only 1 node is read-write, the other
is
> read-only to the repos
>       - Spring used to provide a JackRabbit JCRSessionInHttpSession
pattern
> for the editors who are using a web-based UI.

i am not familiar with this. how is the repository instance
accessed/created?
can you rule out the possibility that a 3rd r/w non-cluster aware instance
is
created?

>  - MySql 5.0.45
>  - Tomcat 5.0.30
>  - Sun JDK 1.5
>  - Redhat Enterprise Linux
>
> All suggestions welcome.

hmm, just a few random guesses....

could be
- a bundle db pm-related issue
- a clustering- or clustering-config related issue
- an issue caused by multiple r/w jackrabbit instances
  accessing the same db
- a jr core issue

since this is a rather sophisticated setup it's not gonna be easy to
investigate.
however, we'd definitely need more information about the operations that
lead
to the corrupt state.

btw: please feel free to create a jira issue.

cheers
stefan

> Regards,
> Shaun
>
>
> -----Original Message-----
> From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com]
> Sent: 17 August 2007 11:26
> To: users@jackrabbit.apache.org
> Subject: Re: Node corruption using Jackrabbit 1.3.1?
>
> hi shaun,
>
> On 8/16/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> > Hi all,
> >
> > We upgraded to JackRabbit 1.3.1 a few days ago.
> >
> > We have since seen a couple of occasions where we've been able to get
the
> > repository in an indeterminate state. The following output shows the
state
> > of a node which has an ordered child node property called
acme:components
> > e.g.
> >
> >
> >
> > [miq:FooBar] > nt:base
> >
> >                orderable
> >
> >                + acme:components (acme:Component) multiple COPY
> >
> >
> >
> > We have an instance of FooBar where acme:components[5] has disappeared??
> >
> > e.g.
> >
> >
> >
> > name                           type            node      new
> modified
> >
> > ------------------------------ --------------- --------- ---------
> ---------
> >
> > acme:components                 acme:Section     true      false
false
> >
> > acme:components[2]              acme:Text        true      false
false
> >
> > acme:components[3]              acme:Text        true      false
false
> >
> > acme:components[4]              acme:Text        true      false
false
> >
> > acme:components[6]              acme:Section     true      false
false
> >
> > acme:components[7]              acme:Section     true      false
false
> >
> > jcr:created                    Date            false     false     false
> >
> > jcr:primaryType                Name            false     false     false
> >
> > jcr:uuid                       String          false     false     false
> >
> >
> >
> > I presume this could happen if the deletion of the child node succeeded
by
> > the saving of the parent FooBar node failed for some reason?
>
> that should be possible since the changelog of a save operation is stored
> atomically. if an error occurs during processing of the change log all
> previous changes are rolled back.
>
> >
> >
> >
> > Surely this is a state that should never happen?
>
> absolutely, and the problem you're describing is very alarming indeed!
>
> did you notice anything peculiar about the corrupt nodes? is there a
chance
> to reconstruct the steps that lead to this state?
>
> furthermore, could you please share some details about your
> config/deployment?
>
> the only possible explanation i can currently come up with is
> that there are multiple jackrabbit instances accessing the same
> database...
>
> cheers
> stefan
>
>
> >
> >
> >
> > Regards,
> >
> > Shaun
> >
> >
> >
> >
>
>

Mime
View raw message