manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1225) I have built manifoldcf with zookeeper syncronization with the followiing databases
Date Fri, 14 Aug 2015 12:33:45 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696945#comment-14696945
] 

Karl Wright commented on CONNECTORS-1225:
-----------------------------------------

I improved scheduling logging to help determine what exactly is happening.  It's actually
a bit confusing, but so far I have not found any actual problem.

What happens is that documents get prioritized in batches of 1000.  This means that even under
the best of conditions, when MCF crawls against a single repository connection with no throttling
and with multiple jobs, it will tend to work on 1000 documents from one job at a time, before
going on to documents from the other job.  This, at least, seems to explain the observed behavior
for a job that has been run before.  I'm still looking into the behavior for two jobs that
have never been run before to see if that can be explained.

> I have built manifoldcf with zookeeper syncronization with the followiing databases
> -----------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1225
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1225
>             Project: ManifoldCF
>          Issue Type: Test
>            Reporter: annamaneni raveendra
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message