manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Question about using ManifolfCF Repository Connectors
Date Wed, 16 Jul 2014 13:34:41 GMT
Hi Prasad,

Yes, please create a ticket any problems you find in the connector
implementations.  If a document is missing in CMIS, the CMIS connector
should definitely be calling IProcessActivity.deleteDocument().

Karl



On Wed, Jul 16, 2014 at 9:07 AM, Paththamestrige Perera <
prasad.srimal.perera@gmail.com> wrote:

> Hello Karl,
>
> Thanks for the reply! I'm going to checkout ManifoldCF trunk and check out
> this new changes. I see now that the WorkerThread has delegated version
> check to connectors level which is a better approach!
> I also saw CONNECTORS-994
> <https://issues.apache.org/jira/browse/CONNECTORS-994>. Thanks for that.
>
> One other thing I would wish to see is,
> handling CmisObjectNotFoundException for CMIS connector (also corresponding
> exception handling for Alfresco), which can be useful in sending delete
> document calls to output connectors. Would you think its a proper approach ?
> I would be happy to create a ticket for that.
>
> Thanks!
>
> Prasad Perera.
>
>
> On Tue, Jul 15, 2014 at 5:41 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Prasad,
>>
>> All changes to connector API's will be backwards compatible provided you
>> extend the base connector class.
>>
>> Thanks,
>> Karl
>>
>>
>> On Tue, Jul 15, 2014 at 5:35 PM, Paththamestrige Perera <
>> prasad.srimal.perera@gmail.com> wrote:
>>
>>> Hello Karl,
>>>
>>> Thanks for the quick reply!
>>>
>>> I'm using MCF 1.6 and I haven't checked version 1.7 yet (I see it has a
>>> release date set to 31st of August).
>>>
>>> Regarding the API changes (I assume) you have mentioned in the second
>>> reply, will there be major changes for the output connector as well ? (for
>>> example, the interfaces addOrReplaceDocument & removeDocument will be
>>> altered as well ?). I have my own output connector, working with a
>>> customize indexing system and curious to know how things may change from
>>> 1.6 to 1.7.
>>>
>>> If it matters, I would be glad to create a ticket regarding the document
>>> version handling for repository connectors for the version 1.6 and would be
>>> happy to get those changes in to my project space.
>>>
>>> Thanks!
>>>
>>> Prasad Perera.
>>>
>>>
>>> On Tue, Jul 15, 2014 at 5:16 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Prasad,
>>>>
>>>> Re: the scanOnly flag: Technically it is up to your connector to
>>>> determine how to use this flag.  It is set when the document has not
>>>> changed from the previous run.
>>>>
>>>> The flag was originally added to help support chained models before
>>>> explicit CHAINED model choices were implemented in the framework.  For
>>>> chained models, discovery would not necessarily work correctly unless all
>>>> references could be rediscovered at all times.  In MCF 1.7, all of this
>>>> will be deprecated, and the getDocumentVersions() and processDocuments()
>>>> methods are in fact merged into one method, and an IProcessActivity method
>>>> is provided to check for differences from the previous indexing.
>>>>
>>>> Hope this answers your question.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Tue, Jul 15, 2014 at 5:06 PM, Paththamestrige Perera <
>>>> prasad.srimal.perera@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I'm new to Apache ManifoldCF and I have spent sometime referring the
>>>>> publication 'ManifoldCF in Action' as well. I have started using the
>>>>> ManifoldCF system with the available repository connectors, CMIS Repository
>>>>> Connector, Alfresco Repository Connector and File System Connector.
>>>>>
>>>>> I have used them as continuous crawlers with specific re-crawl
>>>>> intervals. What I have noticed is that, irrelevant to the Document version
>>>>> (whether it has changed or not), in all re-crawl jobs, CMIS and Alfresco
>>>>> connectors process all seeded documents. I took a look at their
>>>>> implementations and as I could see, these repository connectors does
not
>>>>> use the property 'scanOnly' at the processing time of seeded documents
>>>>> which hints if the document version has changed. It seems intentional
by
>>>>> design. So I'm hoping to know why is it necessary to process all seeded
>>>>> documents (oppose to only process documents that were updated within
the
>>>>> re-crawling interval) ?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Prasad Perera.
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message