manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: question regarding manifoldcf
Date Tue, 29 Jul 2014 19:52:02 GMT
Hi Abjitu,

Some CMIS implementations do not support versioning.  See
connectors/cmis/connector/src/main/java/org/apache/manifoldcf/connectors/cmis/CmisRepositoryConnector.java,
around line 1306:

>>>>>>
        //we have to check if this CMIS repository support versioning
        // or if the versioning is disabled for this content
        if(StringUtils.isNotEmpty(document.getVersionLabel())){
          rval[i] = document.getVersionLabel();
        } else {
        //a CMIS document that doesn't contain versioning information will
always be processed
          rval[i] = StringUtils.EMPTY;
        }
<<<<<<

In other words, if your repository does not support getVersionLabel() in a
meaningful way, ManifoldCF cannot either.  You can confirm this by adding
appropriate System.out.println statements in the above block of code.

Thanks,
Karl



On Tue, Jul 29, 2014 at 3:22 PM, Jitu <abjitu@gmail.com> wrote:

> I have checked out trunk from below location. made the build but i can
> still see its crawling the same file again and again.
>
> svn checkout http://svn.apache.org/repos/asf/manifoldcf/trunk mcf-trunk
>
> My configuration :
> Nuxeo input connector
>
>
> Max connections:    10
> Connection type:    CMIS
> Authority group:    None (global authority)
> Parameters:
> username=Administrator
> password=********
> binding=atom
> protocol=http
> server=localhost
> port=8080
> path=/nuxeo/atom/cmis
> repositoryId=
>
> *output connector : solr *connector with max connections 10. as far as i
> know output connector has no information about whether its same file or
> different.
>
> *job configuration : *
> Priority:     5
> Start method:     Start at beginning of schedule window
> Schedule type:     Rescan documents dynamically
> Minimum recrawl interval:     10 minutes
> Maximum recrawl interval:     Infinity
> Expiration interval:     Infinity
> Reseed interval:     60 minutes
> No scheduled run times
> No forced metadata
> Maximum hop count for link type 'child':     Unlimited
> Hop count mode:     Delete unreachable documents
>
>
> i have only one file in my nuxeo repository and i see after every 10 mins
> same file is sent to output connector again and again. i mean the call goes
> to addOrReplaceDocument method inside output connector even though there is
> no change to the file in nuxeo repository.
>
> regards,
> Jitu
>
>
>
> On Tue, Jul 29, 2014 at 11:27 PM, Jitu <abjitu@gmail.com> wrote:
>
>> Thanks Karl and Prasad. its great to hear back so quickly. Thanks for the
>> info it really helped me.
>>
>> Thanks for the support
>>
>> Regards,
>> Jitu
>>
>>
>> On Tue, Jul 29, 2014 at 10:41 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Jitu,
>>>
>>> The bug is that the CMIS and Alfresco connectors reindexed documents
>>> even though they had not changed.  This is now corrected.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Jul 29, 2014 at 12:28 PM, Jitu <abjitu@gmail.com> wrote:
>>>
>>>> Hi Prasad,
>>>>           Thanks for the reply. the bug says "The CMIS and Alfresco
>>>> connectors currently do not look at scanOnly but should". does that mean
>>>> cmis connector and alfresco connector crawls all the files and hands over
>>>> to output connector no matter whether they are modified or not. Ideally it
>>>> should crawl only if the file is modified else not. am i correct?
>>>>
>>>> regards,
>>>> jitu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 29, 2014 at 9:19 PM, Paththamestrige Perera <
>>>> prasad.srimal.perera@gmail.com> wrote:
>>>>
>>>>> Hello Jitu, I had the same issue and this was fixed with
>>>>> CONNECTORS-994 <https://issues.apache.org/jira/browse/CONNECTORS-994>
for
>>>>> the MCF 1.7
>>>>> If you could checkout the mcf-trunk, it will work as expected.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 29, 2014 at 11:31 AM, Jitu <abjitu@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am a freelancer. for my current project i am using manifoldcf
>>>>>> framework where i need to pull documents from cmis repository and
output to
>>>>>> solr connector.
>>>>>>
>>>>>> But i noticed when i set job type as continuous. it is crawling all
>>>>>> the files everytime no matter whether they are modified or not. but
my
>>>>>> requirement is to crawl the files again only if there is any modification.
>>>>>>
>>>>>> how can i do it with manifoldcf.
>>>>>>
>>>>>> Regards,
>>>>>> abjitu
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message