manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-567) Extended seeding interface which provides document versions
Date Thu, 15 Nov 2012 13:06:12 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498003#comment-13498003
] 

Karl Wright commented on CONNECTORS-567:
----------------------------------------

I think not being able to handle deletions is a significant problem, since this is an incremental
crawler.  We'd have to have a solution to that problem before this could be a possibility.
 Right now the only ways deletion is detected is by getting the version string for the document
that no longer exists.

Also, FWIW, I make free copies of ManifoldCF in Action available to all committers, upon request.

                
> Extended seeding interface which provides document versions
> -----------------------------------------------------------
>
>                 Key: CONNECTORS-567
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-567
>             Project: ManifoldCF
>          Issue Type: Wish
>            Reporter: Maciej Lizewski
>
> There are some cases when seeding function can provide document version with data it
already has.
> Current data flow needs one call to addSeedDocuments, then call to getDocumentVersions,
which essentialy must fetch same data, and after that one more call to processDocuments. The
last one probably needs separate call because it needs to fetch document body, however seeding
and getting versions in many cases work on very same data (and probably duplicating requests
to repository).
> Now - reducing number of needed request to repository by eliminating getDocumentVersions
call for document which have version returned by addSeedDocuments could significantly reduce
load.
> getDocumentVersions would still be called for older docuemnts (not returned by addSeedDocuments)
to check if they were modified or deleted.
> This is only proposition...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message