manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Parallelize jobs
Date Mon, 26 Feb 2018 16:17:12 GMT
FWIW, you can test this by interrupting the first job (using "pause" is
good enough), and then resuming it.  This will cause the first job's
documents to be reprioritized concurrently with the outstanding documents
from the other running jobs.  The same restriction, though, still applies
-- that documents are prioritized at queuing time, and won't be
reprioritized just because you start other jobs.

Karl


On Mon, Feb 26, 2018 at 11:14 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Julien,
>
> There's actually quite a bit of logic in MCF to run jobs concurrently.
> The problem, though, is that documents are "scheduled" in advance, and that
> scheduling is not readily updateable on the fly.  So if you have a job
> running that has already queued 100,000 documents and then you start
> another job, that job's documents won't get processed until the first job's
> 100,000 documents are processed.  After that the jobs will run concurrently.
>
> The reason this happens is because MCF is based on a database for managing
> its queue.  The query that locates documents for processing needs to order
> them by something so that documents are handled fairly.  The field that
> this is contained is the "docpriority" field, if you are interested.
>
> For connectors that identify all the documents they are going to crawl all
> in the seeding phase, this makes it look like jobs are completely
> sequential.  For most connectors, however, that is not the case.
>
> Karl
>
>
> On Mon, Feb 26, 2018 at 11:02 AM, Julien <julien.massiera@francelabs.com>
> wrote:
>
>> Hi MCF community,
>>
>>
>>
>> I was wondering if MCF is able to run several jobs concurrently and if
>> there is a specific configuration to do that.
>>
>> Because I have tested to create two jobs, one using a file system input
>> repository and one using a JCIFS input repository, the output is the same
>> for both jobs (Solr). When I start them both, the execution is sequential,
>> one job is somehow waiting till the other one is done.
>>
>> I tested it on a MCF v2.7
>>
>>
>>
>>
>>
>> Regards,
>> Julien
>>
>>
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
Garanti
>> sans virus. www.avast.com
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>> <#m_6257939437437018642_m_-6868647087496393437_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>
>

Mime
View raw message