manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others?
Date Sat, 16 Apr 2016 20:45:25 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244387#comment-15244387
] 

Karl Wright commented on CONNECTORS-1299:
-----------------------------------------

Hi Konstantin,

What you are seeing is an issue with scheduling of documents.  Documents are allotted priority
values at the time they are crawled.  The priority values are calculated with shared external
resources in mind.  That is, if you have two jobs crawling the same resource (as far as the
connector defines it), then the job management code assigns document priorities with ALL users
under consideration.

This leads to some odd effects if you start one job way after you started another.  The first
job will continue to make progress, and it will appear as if the second job doesn't.  But
what is happening is that the first document from the second job won't be crawled until the
first job gets through the documents it had queued at the time the second job started.

The jcifs connector assigns document bins by server:

{code}
  @Override
  public String[] getBinNames(String documentIdentifier)
  {
    return new String[]{server};
  }
{code}

... so plan accordingly.  Also, this code was I believe just fixed recently; some connectors
were not using proper bins and would therefore unnecessarily interfere with each other.


> "Seeding" phase of a job prevents starting others?
> --------------------------------------------------
>
>                 Key: CONNECTORS-1299
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1299
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework crawler agent
>         Environment: Windows
>            Reporter: Konstantin Avdeev
>
> Hello Karl, could you please clarify if this is a bug or a feature? :)
> When I start an smb job for a share containing a lot of files (can be reproduced with
a \Windows directory :)) and then start a second job, the last one remains some time (depends
on amount of data processing by the first one) with the status "running", but showing {{"Active=1"}}
and does not progress.
> Setting log level to Debug did not shed a light on this, unfortunately.
> It would be great, if could elaborate on that a little!
> Thank you!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message