jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Papez (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-2357) Duplicate entries in the index
Date Mon, 19 Oct 2009 23:32:59 GMT

    [ https://issues.apache.org/jira/browse/JCR-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767613#action_12767613
] 

Benjamin Papez commented on JCR-2357:
-------------------------------------

Hello,

indeed doc being 0 is not the problem. It was just by accident returned for the watched node.
As multiple nodes are duplicated in the index, I have extended my breakpoint conditions and
other nodes have doc numbers > 0, although different from the one in Luke, so it looks
like the doc number is an identifier in the segment (?).
Also I have seen doc numbers 0 and > 0 properly being deleted from the index. So this is
not the right place to track down the problem.

I have found another hotspot right now, which I still need to investigate. After the above
shown stacktraces, the same document is created again with this stacktrace:

NodeIndexer.createDoc() line: 247	
SearchIndex.createDocument(NodeState, NamespaceMappings, IndexFormatVersion) line: 1122	
SearchIndex.updateNodes(Iterator<NodeId>, Iterator<NodeState>) line: 635	
SearchManager.onEvent(EventIterator) line: 459	
EventConsumer.consumeEvents(EventStateCollection) line: 244	
ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198	
EventStateCollection.dispatch() line: 455	
SharedItemStateManager$Update.end() line: 756	
SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102	
XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351	
XAItemStateManager.update(ChangeLog) line: 354	
XAItemStateManager(LocalItemStateManager).update() line: 326	
SessionItemStateManager.update() line: 326	
NodeImpl(ItemImpl).save() line: 1098	
JackrabbitNodeWrapper(JCRItemWrapperImpl).save() line: 162	
JCRNodeWrapperImpl.copyFile(String, String) line: 1293	
JCRNodeWrapperImpl.copyFile(String) line: 1282	

This one is happening in the SearchManager event consumer, which comes after the FieldReferenceListener
and RulesListener (shown in the above stacktraces). But because the previous event consumers
triggered a save themselves, there already was a nested call to SearchManager.onEvent, which
already created an entry for the node. 
Now in this last case, I can see that the node-id is NOT in the removedNodes collection of
the SearchManager.onEvent method, so it is only created again (although it already exists)
and is not deleted. There is a NODE_ADDED event for this node and also several PROPERTY_ADDED
events, that is why its not in the removedNodes. So I think this is the reason why there are
duplicate index entries.

> Duplicate entries in the index
> ------------------------------
>
>                 Key: JCR-2357
>                 URL: https://issues.apache.org/jira/browse/JCR-2357
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.5.0, 1.6.0, 2.0-alpha9
>            Reporter: Benjamin Papez
>            Priority: Critical
>
> We are very frequently having problems with queries, because the results are always duplicated.
The reason for it has been identified to be in the indexing, because when we get the problem,
we now look at the index with Luke, and we can see that documents with the same UUID are there
twice. This happens with Jackrabbit 1.5.0, 1.6.0 and also with 2.0-alpha10. 
> I am able to recreate the problem quite regular. We have some event listeners, which
themselves call the node.save or session.save method. I made a breakpoint in NodeIndexer.createDoc()
and stop when there is a specific name in the fullpath property. I can see that two listeners
we wrote ( FieldReferenceListener and RulesListener) lead to the NodeIndexer.createDoc() -
below are the stacktraces (based on Jackrabbit 2.0-alpha10)
>  NodeIndexer.createDoc() line: 247 
>  SearchIndex.createDocument(NodeState, NamespaceMappings, IndexFormatVersion) line: 1122

>  SearchIndex.updateNodes(Iterator<NodeId>, Iterator<NodeState>) line: 635

>  SearchManager.onEvent(EventIterator) line: 459 
>  EventConsumer.consumeEvents(EventStateCollection) line: 244 
>  ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198 
>  EventStateCollection.dispatch() line: 455 
>  SharedItemStateManager$Update.end() line: 756 
>  SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102 
>  XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351 
>  XAItemStateManager.update(ChangeLog) line: 354 
>  XAItemStateManager(LocalItemStateManager).update() line: 326 
>  SessionItemStateManager.update() line: 326 
>  NodeImpl(ItemImpl).save() line: 1098 
>  FieldReferenceListener.updateFullPath(Node) line: 193 
>  FieldReferenceListener.onEvent(EventIterator) line: 101 
>  EventConsumer.consumeEvents(EventStateCollection) line: 244 
>  ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198 
>  EventStateCollection.dispatch() line: 455 
>  SharedItemStateManager$Update.end() line: 756 
>  SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102 
>  XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351 
>  XAItemStateManager.update(ChangeLog) line: 354 
>  XAItemStateManager(LocalItemStateManager).update() line: 326 
>  SessionItemStateManager.update() line: 326 
>  NodeImpl(ItemImpl).save() line: 1098 
>  NodeIndexer.createDoc() line: 247 
>  SearchIndex.createDocument(NodeState, NamespaceMappings, IndexFormatVersion) line: 1122

>  SearchIndex.updateNodes(Iterator<NodeId>, Iterator<NodeState>) line: 635

>  SearchManager.onEvent(EventIterator) line: 459 
>  EventConsumer.consumeEvents(EventStateCollection) line: 244 
>  ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198 
>  EventStateCollection.dispatch() line: 455 
>  SharedItemStateManager$Update.end() line: 756 
>  SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102 
>  XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351 
>  XAItemStateManager.update(ChangeLog) line: 354 
>  XAItemStateManager(LocalItemStateManager).update() line: 326 
>  SessionItemStateManager.update() line: 326 
>  NodeImpl(ItemImpl).save() line: 1098 
>  XASessionImpl(SessionImpl).save() line: 928 
>  RulesListener.onEvent(EventIterator) line: 376 
>  EventConsumer.consumeEvents(EventStateCollection) line: 244 
>  ObservationDispatcher.dispatchEvents(EventStateCollection) line: 198 
>  EventStateCollection.dispatch() line: 455 
>  SharedItemStateManager$Update.end() line: 756 
>  SharedItemStateManager.update(ChangeLog, EventStateCollectionFactory) line: 1102 
>  XAItemStateManager(LocalItemStateManager).update(ChangeLog) line: 351 
>  XAItemStateManager.update(ChangeLog) line: 354 
>  XAItemStateManager(LocalItemStateManager).update() line: 326 
>  SessionItemStateManager.update() line: 326 
>  NodeImpl(ItemImpl).save() line: 1098 
> Now when continuing to debug after the second NodeIndexer.createDoc() I see that the
node with the watched UUID is also passed within the remove collection into  SearchIndex.updateNodes(Iterator<NodeId>,
Iterator<NodeState>) , but when looking at the index with Luke before and after finishing
this SearchIndex.updateNodes, I see that the first document with the UUID was NOT deleted
and then we always have two documents with the same UUID.
> Now it would be interesting, why the deletion failed. When debugging DeleteNode.exceute
and stopping at the watched UUID, I come down the following methods:
> CommittableIndexReader(IndexReader).deleteDocuments(Term) line: 873	
> PersistentIndex(AbstractIndex).removeDocument(Term) line: 225	
> PersistentIndex.removeDocument(Term) line: 90	
> MultiIndex$DeleteNode.execute(MultiIndex) line: 1951	
> MultiIndex.executeAndLog(MultiIndex$Action) line: 1084	
> MultiIndex.update(Collection<NodeId>, Collection<Document>) line: 442	
> SearchIndex.updateNodes(Iterator<NodeId>, Iterator<NodeState>) line: 645

> What I can see in IndexReader.deleteDocuments(Term) is that docs.next() returns true,
but docs.doc() then returns 0 as the doc-id, which I think is the cause that the right doc-id
is not deleted. 
> Any clues what goes wrong here or hints what else I could watch for on debugging ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message