Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@manifoldcf.apache.org
Date: Sat, 16 Apr 2016 20:45:25 +0000 (UTC)
From: "Karl Wright (JIRA)" <jira@apache.org>
To: dev@manifoldcf.apache.org
Message-ID: <JIRA.12959414.1460837005000.248216.1460839525518@Atlassian.JIRA>
In-Reply-To: <JIRA.12959414.1460837005000@Atlassian.JIRA>
References: <JIRA.12959414.1460837005000@Atlassian.JIRA>
 <JIRA.12959414.1460837005129@arcas>
Subject: [jira] [Commented] (CONNECTORS-1299) "Seeding" phase of a job
 prevents starting others?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244387#comment-15244387 ] 

Karl Wright commented on CONNECTORS-1299:
-----------------------------------------

Hi Konstantin,

What you are seeing is an issue with scheduling of documents.  Documents are allotted priority values at the time they are crawled.  The priority values are calculated with shared external resources in mind.  That is, if you have two jobs crawling the same resource (as far as the connector defines it), then the job management code assigns document priorities with ALL users under consideration.

This leads to some odd effects if you start one job way after you started another.  The first job will continue to make progress, and it will appear as if the second job doesn't.  But what is happening is that the first document from the second job won't be crawled until the first job gets through the documents it had queued at the time the second job started.

The jcifs connector assigns document bins by server:

{code}
  @Override
  public String[] getBinNames(String documentIdentifier)
  {
    return new String[]{server};
  }
{code}

... so plan accordingly.  Also, this code was I believe just fixed recently; some connectors were not using proper bins and would therefore unnecessarily interfere with each other.


> "Seeding" phase of a job prevents starting others?
> --------------------------------------------------
>
>                 Key: CONNECTORS-1299
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1299
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework crawler agent
>         Environment: Windows
>            Reporter: Konstantin Avdeev
>
> Hello Karl, could you please clarify if this is a bug or a feature? :)
> When I start an smb job for a share containing a lot of files (can be reproduced with a \Windows directory :)) and then start a second job, the last one remains some time (depends on amount of data processing by the first one) with the status "running", but showing {{"Active=1"}} and does not progress.
> Setting log level to Debug did not shed a light on this, unfortunately.
> It would be great, if could elaborate on that a little!
> Thank you!


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)