From dev-return-112409-archive-asf-public=cust-asf.ponee.io@cloudstack.apache.org Wed Jan 23 19:13:36 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id ABD0E18066C for ; Wed, 23 Jan 2019 19:13:35 +0100 (CET) Received: (qmail 62905 invoked by uid 500); 23 Jan 2019 18:13:34 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 62887 invoked by uid 99); 23 Jan 2019 18:13:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2019 18:13:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 83D2DC07EF for ; Wed, 23 Jan 2019 18:13:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.797 X-Spam-Level: * X-Spam-Status: No, score=1.797 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id B1TmC_7Nm2U1 for ; Wed, 23 Jan 2019 18:13:31 +0000 (UTC) Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 882CF623BC for ; Wed, 23 Jan 2019 18:13:31 +0000 (UTC) Received: by mail-wm1-f65.google.com with SMTP id t200so370034wmt.0 for ; Wed, 23 Jan 2019 10:13:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=gfLenuWyBCCcIlNazAVWQTUpiKCBRYE9VGT1ok+PeMA=; b=lTTC9E/SRTrz/qIjbw8q4vymp3iX6MljsGFXrcdERNZ6Tf2XeTHyZThqiCLNNZyIC+ KcnVz6W6pHzQSTpm7RkJAZ0l0EFgvFNd7ZSvAXUYPmAw5EHzkpUvnPZtUg2T2FXjfdkF tJnzU462Dxf1qCthX6yuZGqlYbfxPDE1VuGg46ggzXAhV8s5//T3u/KDiIS6Aqbb1m5k j2iZpQoFT+BfmCaS+s/x2LBS21SLNbsPz1vPwbyfoblbrnZe2DmzyaMrVTzP5X4Rgfu2 XDUiISny39T7Y/eDSo09p4R/sFrLCZovoi6BmZmhxSGuPJOX+y7fpR9iLne7Jukzarty Vfsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=gfLenuWyBCCcIlNazAVWQTUpiKCBRYE9VGT1ok+PeMA=; b=b4m1CKu3hue395Kbc4kieoLqoQpXCcr+AsKaRfeNV0FOBaQJlO7u/9QPq8Ki871sn/ 7Jm3i2y7gdTECWX+JbDdyOZi1yP4Q2ufZmmmfE/sUUFmcf88D7Vo3Ov/tyPGYVKqqQLb 3/AdX3pU3j+ZUiX0OABjOH2h+e7OibiczuNUcWftqyxnJocpFLTP8vYcjBwYMmpMlyxi o6OFCIFEeu6oI1aU5MqzLNVoog0uDtVKyBj6Q+MLqMSHGHWRZEIpgoImvVimzfUFIKRG N2sWfqjXhEl5bkKdjF2ZxClotGvckohgSnho1A7RbwMn1g3uKvisPsGw9kXWnKazxb2q dS+w== X-Gm-Message-State: AJcUukelF77HPee8ylBXdUI2WqluTgbuSy7w0PgJJlKnH02mW+prIdiC 0hKCZYtWvVR+ESCV+LPvU6n6CV4Cs4fjVhNg5YvHCvGYwgY= X-Google-Smtp-Source: ALg8bN49TaOLKd5FJWojr6/GswuH483HgNHVDmYtOZ0AHoDoX/dbl0hMFc3xEinYvDvTZGaM5Nrccuv6DglskcsowTE= X-Received: by 2002:a1c:1f54:: with SMTP id f81mr3949352wmf.6.1548267210735; Wed, 23 Jan 2019 10:13:30 -0800 (PST) MIME-Version: 1.0 References: <643723482.244007.1548251062995.JavaMail.zimbra@arhont.com> In-Reply-To: <643723482.244007.1548251062995.JavaMail.zimbra@arhont.com> From: Suresh Kumar Anaparti Date: Wed, 23 Jan 2019 23:43:19 +0530 Message-ID: Subject: Re: Help! Jobs stuck in pending state To: dev Content-Type: multipart/alternative; boundary="0000000000001d071e0580240d0d" --0000000000001d071e0580240d0d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Alireza, *sync_queue *table is the actual VM sync queue which holds a queue id for each VM (*sync_objtype*: VmWorkJobQueue, *sync_objid*: ) and the VM jobs would reside in *sync_queue_item* table against that queue id. Only one running job is allowed per VM queue (*queue_size_limit*: 1 in *sync_queue* table). The active/running job would have the *queue_proc_id*, *queue_proc_number* and *queue_proc_time* set in the *sync_queue_item* tabl= e and the rest jobs with that queue id would be waiting for active job to complete. So, to delete pending jobs, records in the *sync_queue_item *tabl= e has to be cleared for the respective VMs, not the *sync_queue *table. I think, in your case, snapshots is taking long time and other jobs in that VM are pending for long time as they are in queue waiting for snapshot job to complete. What are the config values set for "job.cancel.threshold.minutes", "job.expire.minutes" and "volume.snapshot.job.cancel.threshold"? Are the jobs cancelled after the threshold time? Thanks, Suresh On Wed, Jan 23, 2019 at 7:14 PM Andrei Mikhailovsky wrote: > Hi > > I've had this issue a few times in 2018 and managed to get it fixed prett= y > easily, although had spent a number of hours initially trying to figure o= ut > WTF is going on. This issue looks like one of those artefacts that creepe= d > up in one of the versions released in 2018 and hasn't been addressed by t= he > dev team. > > The way I fixed it was similar to what has been recommended earlier. > However, the difference was that I am sure I've looked at more tables tha= n > just the two suggested. Basically, I've stopped the management server, > created the sql backup, connected to the sql db and listed all tables. > Grepped for the words like job/schedule/queue/sync. After that I've went > through all the tables and pretty much removed all the past / active / > awaiting execution jobs. I have started by looking at the vm related jobs > (the vm that I've tried to start but wasn't able to). This has worked onc= e, > but the second time I had to remove a lot more jobs which relate to other > vms. After that I've started the management server and all went well from > there. > > What I have also noticed is that my snapshot jobs (I use KVM and Ceph) > seem to be blocking jobs on the hypervisor hosts which are running these > snapshots. So, if I am trying to perform various vm related jobs on a hos= t > server which is currently running a snapshot process, that job will not b= e > executed until the snapshot process is done. I've tested this countless > number of times and it's still the case. Again, this issued appeared in o= ne > of the 2018 releases as I've never seen between 2012 - 2017. > > Both issues are annoying as hell! > > Cheers > > ----- Original Message ----- > > From: "Alireza Eskandari" > > To: "dev" > > Sent: Wednesday, 23 January, 2019 12:40:48 > > Subject: Re: Help! Jobs stuck in pending state > > > I'm following this issue in github: > > https://github.com/apache/cloudstack/issues/3104 > > Please leave your comments > > Thanks > > > > On Wed, Jan 23, 2019 at 12:39 PM Wei ZHOU wrote= : > > > >> Hi Alireza, > >> > >> could you try again after restarting mgt server ? > >> > >> -Wei > >> > >> Alireza Eskandari =E4=BA=8E2019=E5=B9=B41=E6= =9C=8823=E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8A=E5=8D=886:22=E5=86=99=E9=81= =93=EF=BC=9A > >> > >> > First I deleted two jobs which was existed in vm_work_job table and > its > >> > related entry in sync_queue table but it doesn't help. > >> > Then I delete all the entries in sync_queue tables and again no > success. > >> > Any idea? > >> > > >> > On Wed, Jan 23, 2019 at 1:50 AM Wei ZHOU > wrote: > >> > > >> > > If you know the instance id and mysql password, it should work aft= er > >> > > removing some records in mysql. > >> > > > >> > > ``` > >> > > set @id=3DXXXXX; > >> > > > >> > > delete from vm_work_job where vm_instance_id=3D@id; > >> > > delete from sync_queue where sync_objid=3D@id; > >> > > ``` > >> > > > >> > > Alireza Eskandari =E4=BA=8E2019=E5=B9=B4= 1=E6=9C=8822=E6=97=A5=E5=91=A8=E4=BA=8C > =E4=B8=8B=E5=8D=8810:59=E5=86=99=E9=81=93=EF=BC=9A > >> > > > >> > > > Hi guys > >> > > > I have opened a bug in jira about my problem in CS: > >> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-10401 > >> > > > CloudStack doesn't process jobs! My cloud in totally unusable. > >> > > > Thanks in advance for you help. > >> > > > > >> > > > >> > > --0000000000001d071e0580240d0d--