jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Papez (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-2357) Duplicate entries in the index
Date Sun, 18 Oct 2009 13:07:31 GMT

    [ https://issues.apache.org/jira/browse/JCR-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767057#action_12767057
] 

Benjamin Papez commented on JCR-2357:
-------------------------------------

Hello Marcel,

thanks for your first input.

>> [...] but when looking at the index with Luke before and after finishing this SearchIndex.updateNodes,
I see that the first document
>> with the UUID was NOT deleted [...]

>deletions are not immediately persisted but may be held in memory for a while (durability
is guaranteed by using a redo log). that may be the reason why you still see the >node
as not deleted.

>does a restart help to correct the problem, or do you have to re-index the workspace?

I did a shutdown of my Tomcat process and restarted the server. The index still had the duplicate
UUID's and the query results were returned twice. 
I then stopped the server again, deleted the index of the default workspace, started the server
and then I could already see in Luke that the there was only one entry for the watched UUID
and the query results then also had no duplicates.

>> [...] docs.next() returns true, but docs.doc() then returns 0 as the doc-id [...]

>0 is a valid document number. but there might still be something wrong...

If IndexReader.deleteDocuments(Term) is called with the UUID as term and there is only one
document with this UUID and it is e.g. doc-id 880 (as seen in Luke), is it then valid that
0 is returned as document number ? Due to that the document is not deleted and a second document
with the same UUID is created.

It could perhaps also be a Lucene bug. Do you have more insight on how this SegmentTermDocs.next()
works and if the returning of the 0 doc-id can be investigated further ?

>are you able to strip down your code and provide a simple test that reproduces the issue?
I guess it's not that easy ;) 

This may be a difficult task, as the test happens on a site import of our CMS. I would need
to try writing a simple unit test with event listeners doing the same, I am not sure I will
be able to reproduce it like that.

Regards,
Benjamin

> Duplicate entries in the index
> ------------------------------
>
>                 Key: JCR-2357
>                 URL: https://issues.apache.org/jira/browse/JCR-2357
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.5.0, 1.6.0, 2.0-alpha9
>            Reporter: Benjamin Papez
>            Priority: Critical
>
> We are very frequently having problems with queries, because the results are always duplicated.
The reason for it has been identified to be in the indexing, because when we get the problem,
we now look at the index with Luke, and we can see that documents with the same UUID are there
twice. This happens with Jackrabbit 1.5.0, 1.6.0 and also with 2.0-alpha10. 
> I am able to recreate the problem quite regular. We have some event listeners, which
themselves call the node.save or session.save method. I made a breakpoint in NodeIndexer.createDoc()
and stop when there is a specific name in the fullpath property. I can see that two listeners
we wrote ( FieldReferenceListener and RulesListener) lead to the NodeIndexer.createDoc() -
below are the stacktraces (based on Jackrabbit 2.0-alpha10)
>  NodeIndexer.createDoc() line: 247 
>  SearchIndex.createDocument(NodeState, NamespaceMappings, IndexFormatVersion) line: 1122

>  SearchIndex.updateNodes(Iterator<NodeId>, Iterator<NodeState>) line: 635

>  SearchManager.onEvent(EventIterator) line: 459 
>  EventConsumer.consumeEvents(EventStateCollection) line: 244 
>  ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198 
>  EventStateCollection.dispatch() line: 455 
>  SharedItemStateManager$Update.end() line: 756 
>  SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102 
>  XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351 
>  XAItemStateManager.update(ChangeLog) line: 354 
>  XAItemStateManager(LocalItemStateManager).update() line: 326 
>  SessionItemStateManager.update() line: 326 
>  NodeImpl(ItemImpl).save() line: 1098 
>  FieldReferenceListener.updateFullPath(Node) line: 193 
>  FieldReferenceListener.onEvent(EventIterator) line: 101 
>  EventConsumer.consumeEvents(EventStateCollection) line: 244 
>  ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198 
>  EventStateCollection.dispatch() line: 455 
>  SharedItemStateManager$Update.end() line: 756 
>  SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102 
>  XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351 
>  XAItemStateManager.update(ChangeLog) line: 354 
>  XAItemStateManager(LocalItemStateManager).update() line: 326 
>  SessionItemStateManager.update() line: 326 
>  NodeImpl(ItemImpl).save() line: 1098 
>  NodeIndexer.createDoc() line: 247 
>  SearchIndex.createDocument(NodeState, NamespaceMappings, IndexFormatVersion) line: 1122

>  SearchIndex.updateNodes(Iterator<NodeId>, Iterator<NodeState>) line: 635

>  SearchManager.onEvent(EventIterator) line: 459 
>  EventConsumer.consumeEvents(EventStateCollection) line: 244 
>  ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198 
>  EventStateCollection.dispatch() line: 455 
>  SharedItemStateManager$Update.end() line: 756 
>  SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102 
>  XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351 
>  XAItemStateManager.update(ChangeLog) line: 354 
>  XAItemStateManager(LocalItemStateManager).update() line: 326 
>  SessionItemStateManager.update() line: 326 
>  NodeImpl(ItemImpl).save() line: 1098 
>  XASessionImpl(SessionImpl).save() line: 928 
>  RulesListener.onEvent(EventIterator) line: 376 
>  EventConsumer.consumeEvents(EventStateCollection) line: 244 
>  ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198 
>  EventStateCollection.dispatch() line: 455 
>  SharedItemStateManager$Update.end() line: 756 
>  SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102 
>  XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351 
>  XAItemStateManager.update(ChangeLog) line: 354 
>  XAItemStateManager(LocalItemStateManager).update() line: 326 
>  SessionItemStateManager.update() line: 326 
>  NodeImpl(ItemImpl).save() line: 1098 
> Now when continuing to debug after the second NodeIndexer.createDoc() I see that the
node with the watched UUID is also passed within the remove collection into  SearchIndex.updateNodes(Iterator<NodeId>,
Iterator<NodeState>) , but when looking at the index with Luke before and after finishing
this SearchIndex.updateNodes, I see that the first document with the UUID was NOT deleted
and then we always have two documents with the same UUID.
> Now it would be interesting, why the deletion failed. When debugging DeleteNode.exceute
and stopping at the watched UUID, I come down the following methods:
> CommittableIndexReader(IndexReader).deleteDocuments(Term) line: 873	
> PersistentIndex(AbstractIndex).removeDocument(Term) line: 225	
> PersistentIndex.removeDocument(Term) line: 90	
> MultiIndex$DeleteNode.execute(MultiIndex) line: 1951	
> MultiIndex.executeAndLog(MultiIndex$Action) line: 1084	
> MultiIndex.update(Collection<NodeId>, Collection<Document>) line: 442	
> SearchIndex.updateNodes(Iterator<NodeId>, Iterator<NodeState>) line: 645

> What I can see in IndexReader.deleteDocuments(Term) is that docs.next() returns true,
but docs.doc() then returns 0 as the doc-id, which I think is the cause that the right doc-id
is not deleted. 
> Any clues what goes wrong here or hints what else I could watch for on debugging ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message