From dev-return-17803-archive-asf-public=cust-asf.ponee.io@manifoldcf.apache.org Mon Feb 26 18:51:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4AFE218064A for ; Mon, 26 Feb 2018 18:51:05 +0100 (CET) Received: (qmail 35355 invoked by uid 500); 26 Feb 2018 17:51:04 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 35343 invoked by uid 99); 26 Feb 2018 17:51:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Feb 2018 17:51:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D9E39C6043 for ; Mon, 26 Feb 2018 17:51:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id aH_4-v5bHKRm for ; Mon, 26 Feb 2018 17:51:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 26E485FC4D for ; Mon, 26 Feb 2018 17:51:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 41648E0968 for ; Mon, 26 Feb 2018 17:51:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8978C255DF for ; Mon, 26 Feb 2018 17:51:00 +0000 (UTC) Date: Mon, 26 Feb 2018 17:51:00 +0000 (UTC) From: "Karl Wright (JIRA)" To: dev@manifoldcf.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CONNECTORS-1497) Re-index seeded modified documents when the re-crawl interval is infinity and connector model is MODEL_ADD_CHANGE MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CONNECTORS-1497?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId= =3D16377260#comment-16377260 ]=20 Karl Wright commented on CONNECTORS-1497: ----------------------------------------- This is the wrong place to put this in any case. Please examine the method signature: {code} /** Add an initial set of documents to the queue. * This method is called during job startup, when the queue is being loade= d. * A set of document references is passed to this method, which updates th= e status of the document * in the specified job's queue, according to specific state rules. *@param processID is the current process ID. *@param jobID is the job identifier. *@param legalLinkTypes is the set of legal link types that this connector= generates. *@param docIDs are the local document identifiers. *@param overrideSchedule is true if any existing document schedule should= be overridden. *@param hopcountMethod is either accurate, nodelete, or neverdelete. *@param documentPriorities are the document priorities corresponding to t= he document identifiers. *@param prereqEventNames are the events that must be completed before eac= h document can be processed. */ @Override public void addDocumentsInitial(String processID, Long jobID, String[] le= galLinkTypes, String[] docIDHashes, String[] docIDs, boolean overrideSchedule, int hopcountMethod, IPriorityCalculator[] documentPriorities, String[][] prereqEventNames) throws ManifoldCFException {code} Note the parameter called "overrideSchedule". You want to set that to "tru= e" to override the schedule in the manner you are trying to do. This method is called during seeding. When this is called during the run o= f a non-continuous job, overrideSchedule=3Dtrue already. So the question i= s whether you want all *continuous* jobs to override the schedule every tim= e they reseed. I'm still not sold that that is the right thing, but assumi= ng it is, then you want to find where that happens (it's a different thread= that does continuous job seeding than does initial job seeding) and change= that parameter in the addDocumentsInitial() method call there. > Re-index seeded modified documents when the re-crawl interval is infinity= and connector model is MODEL_ADD_CHANGE > -------------------------------------------------------------------------= ------------------------------------------ > > Key: CONNECTORS-1497 > URL: https://issues.apache.org/jira/browse/CONNECTORS-149= 7 > Project: ManifoldCF > Issue Type: Improvement > Components: Framework agents process > Affects Versions: ManifoldCF 2.9.1 > Reporter: Ahmed Mahfouz > Assignee: Karl Wright > Priority: Major > Attachments: CONNECTORS-1497.patch > > > Trying to avoid a full scan of all documents for a better efficiency with= a large number of documents. I tried so many different setting for the Job= s but I couldn't accomplish that. Especially when the repository connector = model is MODEL_ADD_CHANGE I was expecting the modified documents seeded sho= uld be re-indexed immediately similar to the new seeds but I found out it u= ses the re-crawl time as the scheduled time and it waits for the full scan = to get re-indexed. I avoided full scan by setting the re-crawl interval to = infinity but still, my=C2=A0modified documents seeds were not getting index= ed. After digging into the code for quite good time. I did some modificatio= n to the JobManager and it worked for me. I would like to share the change = with you for review so I opened this ticket. -- This message was sent by Atlassian JIRA (v7.6.3#76005)