Return-Path: X-Original-To: apmail-manifoldcf-dev-archive@www.apache.org Delivered-To: apmail-manifoldcf-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF8C219794 for ; Sat, 16 Apr 2016 20:45:25 +0000 (UTC) Received: (qmail 88462 invoked by uid 500); 16 Apr 2016 20:45:25 -0000 Delivered-To: apmail-manifoldcf-dev-archive@manifoldcf.apache.org Received: (qmail 88405 invoked by uid 500); 16 Apr 2016 20:45:25 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 88392 invoked by uid 99); 16 Apr 2016 20:45:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Apr 2016 20:45:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7F4512C1F5A for ; Sat, 16 Apr 2016 20:45:25 +0000 (UTC) Date: Sat, 16 Apr 2016 20:45:25 +0000 (UTC) From: "Karl Wright (JIRA)" To: dev@manifoldcf.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CONNECTORS-1299) "Seeding" phase of a job prevents starting others? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CONNECTORS-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244387#comment-15244387 ] Karl Wright commented on CONNECTORS-1299: ----------------------------------------- Hi Konstantin, What you are seeing is an issue with scheduling of documents. Documents are allotted priority values at the time they are crawled. The priority values are calculated with shared external resources in mind. That is, if you have two jobs crawling the same resource (as far as the connector defines it), then the job management code assigns document priorities with ALL users under consideration. This leads to some odd effects if you start one job way after you started another. The first job will continue to make progress, and it will appear as if the second job doesn't. But what is happening is that the first document from the second job won't be crawled until the first job gets through the documents it had queued at the time the second job started. The jcifs connector assigns document bins by server: {code} @Override public String[] getBinNames(String documentIdentifier) { return new String[]{server}; } {code} ... so plan accordingly. Also, this code was I believe just fixed recently; some connectors were not using proper bins and would therefore unnecessarily interfere with each other. > "Seeding" phase of a job prevents starting others? > -------------------------------------------------- > > Key: CONNECTORS-1299 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1299 > Project: ManifoldCF > Issue Type: Bug > Components: Framework crawler agent > Environment: Windows > Reporter: Konstantin Avdeev > > Hello Karl, could you please clarify if this is a bug or a feature? :) > When I start an smb job for a share containing a lot of files (can be reproduced with a \Windows directory :)) and then start a second job, the last one remains some time (depends on amount of data processing by the first one) with the status "running", but showing {{"Active=1"}} and does not progress. > Setting log level to Debug did not shed a light on this, unfortunately. > It would be great, if could elaborate on that a little! > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)