Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1EB6105DB for ; Fri, 1 Nov 2013 00:30:06 +0000 (UTC) Received: (qmail 23388 invoked by uid 500); 1 Nov 2013 00:30:05 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 23357 invoked by uid 500); 1 Nov 2013 00:30:05 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 23349 invoked by uid 99); 1 Nov 2013 00:30:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Nov 2013 00:30:05 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of min.chen@citrix.com designates 66.165.176.89 as permitted sender) Received: from [66.165.176.89] (HELO SMTP.CITRIX.COM) (66.165.176.89) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Nov 2013 00:29:59 +0000 X-IronPort-AV: E=Sophos;i="4.93,613,1378857600"; d="scan'208,217";a="69577011" Received: from sjcpex01cl02.citrite.net ([10.216.14.144]) by FTLPIPO01.CITRIX.COM with ESMTP/TLS/AES128-SHA; 01 Nov 2013 00:29:36 +0000 Received: from SJCPEX01CL01.citrite.net ([169.254.1.113]) by SJCPEX01CL02.citrite.net ([10.216.14.144]) with mapi id 14.02.0342.004; Thu, 31 Oct 2013 17:29:35 -0700 From: Min Chen To: Darren Shepherd , "dev@cloudstack.apache.org" Subject: Re: race conditions in VolumeServiceImpl.createBaseImageAsync() creates NPE Thread-Topic: race conditions in VolumeServiceImpl.createBaseImageAsync() creates NPE Thread-Index: AQHO1mij7SnQ8ioHjUmf2e6XfIzcPJoPhg2A Date: Fri, 1 Nov 2013 00:29:34 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.6.130613 x-originating-ip: [10.216.48.12] Content-Type: multipart/alternative; boundary="_000_CE9843A44D6CEminchencitrixcom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CE9843A44D6CEminchencitrixcom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Darren, I just checked the code, you are right. In case of one thread throw= s exception in downloading template to primary, it will delete the entry in= template_store_ref, causing the second thread failing with NPE. We need to= fix this in 4.3. Please file a bug for this. Thanks -min From: Darren Shepherd > Date: Thursday, October 31, 2013 11:39 AM To: "dev@cloudstack.apache.org" >, Min Chen > Subject: race conditions in VolumeServiceImpl.createBaseImageAsync() create= s NPE The following code results in a NPE in bad situations templatePoolRef =3D _tmpltPoolDao.acquireInLockTable(templatePoolRe= fId, storagePoolMaxWaitSeconds); if (templatePoolRef =3D=3D null) { if (s_logger.isDebugEnabled()) { s_logger.info("Unable to acquire lock= on VMTemplateStoragePool " + templatePoolRefId); } templatePoolRef =3D _tmpltPoolDao.findByPoolTemplate(dataStore.= getId(), template.getId()); if (templatePoolRef.getState() =3D=3D ObjectInDataStoreStateMac= hine.State.Ready ) { s_logger.info("Unable to acquire lock= on VMTemplateStoragePool " + templatePoolRefId + ", But Template " + templ= ate.getUniqueName() + " is already copied to primary storage, skip copying"= ); createVolumeFromBaseImageAsync(volume, templateOnPrimarySto= reObj, dataStore, future); return; } throw new CloudRuntimeException("Unable to acquire lock on VMTe= mplateStoragePool: " + templatePoolRefId); } If two threads are trying to stage the same template thread one gets the lo= ck, thread two will wait. If thread one fails to stage the template it wil= l delete the templatePoolRef from the database. Thread two will now get th= e lock in op_lock, but the internal findById will not find a templatePoolRe= f because it has been deleted and return null from acquireInLockTable(). T= echnically thread two has the lock, but the ref templatePoolRef wasn't foun= d. The subsequent line "templatePoolRef =3D _tmpltPoolDao.findByPoolTempla= te(...)" will return null, because it doesn't exist and then on the next li= ne templatePoolRef.getState() will throw a NPE. Darren --_000_CE9843A44D6CEminchencitrixcom_--