Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 083AF2009F4 for ; Thu, 26 May 2016 09:52:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 068CF160A10; Thu, 26 May 2016 07:52:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2C817160939 for ; Thu, 26 May 2016 09:52:06 +0200 (CEST) Received: (qmail 63750 invoked by uid 500); 26 May 2016 07:52:05 -0000 Mailing-List: contact dev-help@fineract.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@fineract.incubator.apache.org Delivered-To: mailing list dev@fineract.incubator.apache.org Received: (qmail 63738 invoked by uid 99); 26 May 2016 07:52:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 May 2016 07:52:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9872E180542 for ; Thu, 26 May 2016 07:52:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.28 X-Spam-Level: *** X-Spam-Status: No, score=3.28 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_BL_SPAMCOP_NET=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=confluxtechnologies-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id H82PJXV9vina for ; Thu, 26 May 2016 07:52:03 +0000 (UTC) Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5E5965F307 for ; Thu, 26 May 2016 07:52:02 +0000 (UTC) Received: by mail-wm0-f41.google.com with SMTP id n129so88745652wmn.1 for ; Thu, 26 May 2016 00:52:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=confluxtechnologies-com.20150623.gappssmtp.com; s=20150623; h=mime-version:date:message-id:subject:from:to; bh=RUNxSwh/qQCM51lHZq0aBrn8JFXSL4TVZ7Wlci5bFds=; b=qYe2CUychqVPYTAdzH010QSC7YvijjpXZf/32fih/OxH3PAHqwEPQ7wRbqfHxf7d8R K0QXGG3vsp8GbsUqHjoKfmC+Vm+eXCvrBRGUph7obm/MozXeddIWK39eIesCIkm1nefY cmFxAr4c/tLqHCIgOKTgvKFanN6cesmSnijAn4R7o7MaAJMrziX2/JOhS5q99BKTI5q+ AFeJguSwTstgRgYqDkg6Hn2h3b/98QzzOUGM3FBLmqO82Vto/urL7YzKg3HjupV6bbUs IznQf2v1x5g734Z9gOy7BeUxX6L/4tD9XLH9Ehneyuo22wM7jLEjfteTCY+6HxMqJN0Z kdxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=RUNxSwh/qQCM51lHZq0aBrn8JFXSL4TVZ7Wlci5bFds=; b=KsrhgBwRW1iA9v/gOKCSafyARc4RGUr/0KIOsPVfacsgXjwUs+vvvPa4jf7GNTAqwA /pbuKmEZIkSYkMwF/ypT/p3eUIMXiasBnvFwVgFjPmHvVVtveHet12FojmPJMMl3nOUt H0KUKxG2ZP56fOH+p3AJI7rlNG54eTiUjuFrUKmLMbggz5JBWLE50CxlLiXNoGB9P2Dv /57/xXey7y2dmb23YrRy2eVPSz90q73OPQ1aZWz3Z0NerApGqM0EY1WhDZoILL6Tw22P PgRb11lj6KIxwfim6kCAJQlRFN5z4MmItYzbnZ1m9NWMVQgsIk9wqsfAjyA871XAWbjk iinw== X-Gm-Message-State: ALyK8tK98DE/Ras6yRP1OM4Yn/P+fT9QkigYmLFYScM5OGMBj849vJuoKil+QzPlgXoQQZyUbqgo7MRXvtC6qw== MIME-Version: 1.0 X-Received: by 10.28.92.20 with SMTP id q20mr2189397wmb.76.1464249120363; Thu, 26 May 2016 00:52:00 -0700 (PDT) Received: by 10.28.40.7 with HTTP; Thu, 26 May 2016 00:52:00 -0700 (PDT) X-Originating-IP: [106.51.26.12] Date: Thu, 26 May 2016 13:22:00 +0530 Message-ID: Subject: [Fineract] Scheduler Jobs - Lock Acquisition Issue From: Nazeer Shaik To: dev@fineract.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1145b3caaead3c0533ba1003 archived-at: Thu, 26 May 2016 07:52:07 -0000 --001a1145b3caaead3c0533ba1003 Content-Type: text/plain; charset=UTF-8 Hi Devs, *We are facing one issue with our scheduler jobs and your help is appreciated. * We have Scheduler Jobs which runs at configured time for each tenant. Each job falls into a job group. At server boot up, we are creating Scheduler instances for each job group. While triggering any job, corresponding scheduler instance will be picked up based on job's group id. Before triggering any job, we are updating it's running status to true in DB, so that no other thread runs same job parellelly. Before updating the running status, we are locking it by using LockModeType.PESSIMISTIC_WRITE. The problem what we are facing is, for some reason we are not able to acquire the lock and we are getting Lock Acquisition exception. After this, all triggers (existing/new) in that group scheduler entering into BLOCKED state. The only way is to recover is restart the server. Please find issue description here https://issues.apache.org/jira/browse/FINERACT-145 Now we are catching that exception outside of transaction scope. After doing this, non of the trigger(s) are entering into blocked state. Since we have to run this job we have added code to retry acquiring the lock for some time. If we are not able to acquire the lock, we will veto the execution. Please find modified code below in org.apache.fineract.infrastructure.jobs.service.*SchedulerTriggerListener*. @Override public boolean vetoJobExecution(final Trigger trigger, final JobExecutionContext context) { final String tenantIdentifier = trigger.getJobDataMap().getString(SchedulerServiceConstants.TENANT_IDENTIFIER); final FineractPlatformTenant tenant = this.tenantDetailsService.loadTenantById(tenantIdentifier); ThreadLocalContextUtil.setTenant(tenant); final JobKey key = trigger.getJobKey(); final String jobKey = key.getName() + SchedulerServiceConstants.JOB_KEY_SEPERATOR + key.getGroup(); String triggerType = SchedulerServiceConstants.TRIGGER_TYPE_CRON; if (context.getMergedJobDataMap().containsKey(SchedulerServiceConstants.TRIGGER_TYPE_REFERENCE)) { triggerType = context.getMergedJobDataMap().getString(SchedulerServiceConstants.TRIGGER_TYPE_REFERENCE); } Integer maxNumberOfRetries = ThreadLocalContextUtil.getTenant() .getConnection().getMaxRetriesOnDeadlock(); Integer maxIntervalBetweenRetries = ThreadLocalContextUtil.getTenant() .getConnection().getMaxIntervalBetweenRetries(); Integer numberOfRetries = 0; boolean proceedJobExecution = false ; while (numberOfRetries <= maxNumberOfRetries) { try { proceedJobExecution = this.schedularService.processJobDetailForExecution(jobKey, triggerType); //In above method call we are checking whether it is currently running or not. If not running we will lock it and update the running //status numberOfRetries = maxNumberOfRetries + 1; } catch (LockAcquisitionException exception) { logger.debug("Not able to acquire the lock to update job running status for JobKey: "+jobKey); try { Random random = new Random(); int randomNum = random.nextInt(maxIntervalBetweenRetries + 1); Thread.sleep(1000 + (randomNum * 1000)); numberOfRetries = numberOfRetries + 1; } catch (InterruptedException e) { } } } return proceedJobExecution ; } Can you please review this solution and let me know if you have any other approach to fix this. Thanks, Nazeer --001a1145b3caaead3c0533ba1003--