jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: Node corruption using Jackrabbit 1.3.1?
Date Fri, 17 Aug 2007 13:03:54 GMT
hi shaun,

are you sure that this is a 1.3.1 specific issue?

i remember an earlier post were you described the same problem,
but apparently you weren't using 1.3.1:
http://www.nabble.com/Strange-%22ignoring-nonexistent-item%22-and-removeitem-fails-tf4169086.html

On 8/17/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> Hi Stefan et al,
> Further update on this, plus some answers to your questions.
>
> The consistency check and fix logic in JackRabbit 1.3.1 solved all but 1 of
> the issues. However although the log reports the remaining issue has been
> fixed each time, this message appears after repeated restarts :(
>
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager  -
> acme: checked 1000/0 bundles...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager  -
> NodeState fe75116c-5617-423b-8c9a-4a964b667f20 references unexistent child
> {http://www.acme.co.uk/xmlns/contentmodel}components with id
> d3c09b52-d3be-4d3c-8807-b7827d337973
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager  -
> acme: checked 2000/0 bundles...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager  -
> acme: Fixing 1 inconsistent bundle(s)...
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager  -
> acme: Fixing bundle fe75116c-5617-423b-8c9a-4a964b667f20
> org.apache.jackrabbit.core.persistence.bundle.BundleDbPersistenceManager  -
> acme: checked 2505/0 bundles.
>
> Is the consistency checker the only way to fix up these problems, or is
> there any way we can 'open the hood' to investigate further?

only by getting your hands really dirty and by delving deep into the code...

>
> Stefan wrote:
> "did you notice anything peculiar about the corrupt nodes? is there a chance
> to reconstruct the steps that lead to this state?"
>
> What tools what you recommend using to review the corrupt nodes? We only
> currently use the command contrib. project.
>
> Reproducing this scenario is proving really difficult. The original
> corruption occurred when a user was creating a particularly complex node
> object which included the creation, deletion and re-ordering of various
> same-name-siblings. After multi-hours of attempts we are yet to reproduce
> the event. Frustrating, but we know its occurred at least twice.
>
> "furthermore, could you please share some details about your
> config/deployment?"
>
> Sure.
>  - JackRabbit 1.3.1
>         - MySql Bundle Persistence Manager
>         - Clustered across 2 nodes - only 1 node is read-write, the other is
> read-only to the repos
>       - Spring used to provide a JackRabbit JCRSessionInHttpSession pattern
> for the editors who are using a web-based UI.

i am not familiar with this. how is the repository instance accessed/created?
can you rule out the possibility that a 3rd r/w non-cluster aware instance is
created?

>  - MySql 5.0.45
>  - Tomcat 5.0.30
>  - Sun JDK 1.5
>  - Redhat Enterprise Linux
>
> All suggestions welcome.

hmm, just a few random guesses....

could be
- a bundle db pm-related issue
- a clustering- or clustering-config related issue
- an issue caused by multiple r/w jackrabbit instances
  accessing the same db
- a jr core issue

since this is a rather sophisticated setup it's not gonna be easy to
investigate.
however, we'd definitely need more information about the operations that lead
to the corrupt state.

btw: please feel free to create a jira issue.

cheers
stefan

> Regards,
> Shaun
>
>
> -----Original Message-----
> From: Stefan Guggisberg [mailto:stefan.guggisberg@gmail.com]
> Sent: 17 August 2007 11:26
> To: users@jackrabbit.apache.org
> Subject: Re: Node corruption using Jackrabbit 1.3.1?
>
> hi shaun,
>
> On 8/16/07, sbarriba <sbarriba@yahoo.co.uk> wrote:
> > Hi all,
> >
> > We upgraded to JackRabbit 1.3.1 a few days ago.
> >
> > We have since seen a couple of occasions where we've been able to get the
> > repository in an indeterminate state. The following output shows the state
> > of a node which has an ordered child node property called acme:components
> > e.g.
> >
> >
> >
> > [miq:FooBar] > nt:base
> >
> >                orderable
> >
> >                + acme:components (acme:Component) multiple COPY
> >
> >
> >
> > We have an instance of FooBar where acme:components[5] has disappeared??
> >
> > e.g.
> >
> >
> >
> > name                           type            node      new
> modified
> >
> > ------------------------------ --------------- --------- ---------
> ---------
> >
> > acme:components                 acme:Section     true      false     false
> >
> > acme:components[2]              acme:Text        true      false     false
> >
> > acme:components[3]              acme:Text        true      false     false
> >
> > acme:components[4]              acme:Text        true      false     false
> >
> > acme:components[6]              acme:Section     true      false     false
> >
> > acme:components[7]              acme:Section     true      false     false
> >
> > jcr:created                    Date            false     false     false
> >
> > jcr:primaryType                Name            false     false     false
> >
> > jcr:uuid                       String          false     false     false
> >
> >
> >
> > I presume this could happen if the deletion of the child node succeeded by
> > the saving of the parent FooBar node failed for some reason?
>
> that should be possible since the changelog of a save operation is stored
> atomically. if an error occurs during processing of the change log all
> previous changes are rolled back.
>
> >
> >
> >
> > Surely this is a state that should never happen?
>
> absolutely, and the problem you're describing is very alarming indeed!
>
> did you notice anything peculiar about the corrupt nodes? is there a chance
> to reconstruct the steps that lead to this state?
>
> furthermore, could you please share some details about your
> config/deployment?
>
> the only possible explanation i can currently come up with is
> that there are multiple jackrabbit instances accessing the same
> database...
>
> cheers
> stefan
>
>
> >
> >
> >
> > Regards,
> >
> > Shaun
> >
> >
> >
> >
>
>

Mime
View raw message