manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jitu <abj...@gmail.com>
Subject Re: question regarding manifoldcf
Date Tue, 29 Jul 2014 19:22:10 GMT
I have checked out trunk from below location. made the build but i can
still see its crawling the same file again and again.

svn checkout http://svn.apache.org/repos/asf/manifoldcf/trunk mcf-trunk

My configuration :
Nuxeo input connector


Max connections:    10
Connection type:    CMIS
Authority group:    None (global authority)
Parameters:
username=Administrator
password=********
binding=atom
protocol=http
server=localhost
port=8080
path=/nuxeo/atom/cmis
repositoryId=

*output connector : solr *connector with max connections 10. as far as i
know output connector has no information about whether its same file or
different.

*job configuration : *
Priority:     5
Start method:     Start at beginning of schedule window
Schedule type:     Rescan documents dynamically
Minimum recrawl interval:     10 minutes
Maximum recrawl interval:     Infinity
Expiration interval:     Infinity
Reseed interval:     60 minutes
No scheduled run times
No forced metadata
Maximum hop count for link type 'child':     Unlimited
Hop count mode:     Delete unreachable documents


i have only one file in my nuxeo repository and i see after every 10 mins
same file is sent to output connector again and again. i mean the call goes
to addOrReplaceDocument method inside output connector even though there is
no change to the file in nuxeo repository.

regards,
Jitu



On Tue, Jul 29, 2014 at 11:27 PM, Jitu <abjitu@gmail.com> wrote:

> Thanks Karl and Prasad. its great to hear back so quickly. Thanks for the
> info it really helped me.
>
> Thanks for the support
>
> Regards,
> Jitu
>
>
> On Tue, Jul 29, 2014 at 10:41 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Jitu,
>>
>> The bug is that the CMIS and Alfresco connectors reindexed documents even
>> though they had not changed.  This is now corrected.
>>
>> Karl
>>
>>
>>
>> On Tue, Jul 29, 2014 at 12:28 PM, Jitu <abjitu@gmail.com> wrote:
>>
>>> Hi Prasad,
>>>           Thanks for the reply. the bug says "The CMIS and Alfresco
>>> connectors currently do not look at scanOnly but should". does that mean
>>> cmis connector and alfresco connector crawls all the files and hands over
>>> to output connector no matter whether they are modified or not. Ideally it
>>> should crawl only if the file is modified else not. am i correct?
>>>
>>> regards,
>>> jitu
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 29, 2014 at 9:19 PM, Paththamestrige Perera <
>>> prasad.srimal.perera@gmail.com> wrote:
>>>
>>>> Hello Jitu, I had the same issue and this was fixed with CONNECTORS-994
>>>> <https://issues.apache.org/jira/browse/CONNECTORS-994> for the MCF
1.7
>>>> If you could checkout the mcf-trunk, it will work as expected.
>>>>
>>>>
>>>>
>>>> On Tue, Jul 29, 2014 at 11:31 AM, Jitu <abjitu@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am a freelancer. for my current project i am using manifoldcf
>>>>> framework where i need to pull documents from cmis repository and output
to
>>>>> solr connector.
>>>>>
>>>>> But i noticed when i set job type as continuous. it is crawling all
>>>>> the files everytime no matter whether they are modified or not. but my
>>>>> requirement is to crawl the files again only if there is any modification.
>>>>>
>>>>> how can i do it with manifoldcf.
>>>>>
>>>>> Regards,
>>>>> abjitu
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message