manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad Perera (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1009) Cmis Repository Connector does not handle Document updating properly
Date Mon, 18 Aug 2014 14:10:18 GMT


Prasad Perera commented on CONNECTORS-1009:

Hello Karl,

After testing new changes, it seems to work  ok with update and deletes. However, the only
issue of update is now it sends DELETE operation for the previous version of the document
to the output connector (search engine) and then send the new version of the document. This
is acceptable but it would be much better if an update was just another forwarding of the
new version of the document so that search engine can handle the update operation as it is
properly than two operations of delete + add.
What would you suggest ?

> Cmis Repository Connector does not handle Document updating properly
> --------------------------------------------------------------------
>                 Key: CONNECTORS-1009
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: CMIS connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Prasad Perera
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>         Attachments: std_logs.txt, std_prints.diff
> As a part of the Fix for CONNECTORS-1004, It seems CmisRepositoryConnector does not handle
document updating properly.
> Case Scenario:
> * Create a continuous crawling job using  CmisRepositoryConnector.
> * Update a document on repository end.
> * The document keep submitting to OutputConnector at each crawling interval though it
was not updated afterwards.
> One possible Fix needed I is : @ CmisRepositoryConnector:processDocument,
>  activities.ingestDocumentWithException(nodeId, version, documentURI, rd);
> The documentURI should point to the old document URI (Now it points to the latest documentURI
discovered and it may seems to confuse document references ?)
> Also, In ECM systems, for example in Alfresco, the documentIDs are formulated with the
version number as well.
> Ex: workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.0 --> version 1.0
> workspace://SpacesStore/8e12a887-3fa8-48d6-8516-5bcfad358ba2;1.1 --> version 1.1
> When we setup a query to crawl a repository folder, we discover content by referring
the child nodes. Because of that, now it seems to queue all the document versions and submit
them to OutputConnector thus producing duplicate documents at the output (search) side.
> Is there a way to avoid this problem ? It will be great if the repository can just take
the latest document version and submit it as an update.

This message was sent by Atlassian JIRA

View raw message