Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A0D62101C8 for ; Thu, 3 Apr 2014 13:53:02 +0000 (UTC) Received: (qmail 7401 invoked by uid 500); 3 Apr 2014 13:52:59 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 6599 invoked by uid 500); 3 Apr 2014 13:52:55 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 5097 invoked by uid 99); 3 Apr 2014 13:52:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2014 13:52:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of daan.hoogland@gmail.com designates 209.85.213.46 as permitted sender) Received: from [209.85.213.46] (HELO mail-yh0-f46.google.com) (209.85.213.46) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2014 13:52:47 +0000 Received: by mail-yh0-f46.google.com with SMTP id b6so1677700yha.5 for ; Thu, 03 Apr 2014 06:52:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=Lp6Q8qmQmZE7JXJoDhtVBsC8n3/JB57i2Xv2atUbn/o=; b=D68asWVWn8AjqWU+nBaPZZKnGoB82xmhJezLaIuHSGHcXYfEmMjiV3gG1KqMCOBmFO TYm17upmeyGI1EzS3yzdChHFbW8ii3WPB2RcQCTl9NJOB5AVQRI2ma65bKgSWuXIN7jP 1ToFmAJWv+cVBJb84Myce/m9XssUGPG7VTdCrRdofS8uswOiJrk4Vg1X4U7L0Fm0+H/5 5dbUdslOCRJxukBtZHBrk0xbuuINykaVIcX/WwMOs2FJXgqJCeXyiuxVJ/dPWrc5KBXm HBFI4AIR789HGkOTsWwX/TaLuKgPYVXPi5QvsiwdoleypLeykebHZh8erAhe6cpgkg57 Qugg== X-Received: by 10.236.220.72 with SMTP id n68mr4170913yhp.102.1396533146524; Thu, 03 Apr 2014 06:52:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.146.66 with HTTP; Thu, 3 Apr 2014 06:52:06 -0700 (PDT) In-Reply-To: References: <20140402135547.2272.43837@reviews.apache.org> <20140402183144.9496.69359@reviews.apache.org> <890E857B-FDC4-4925-A7BC-6FFD227E5C02@eecg.toronto.edu> From: Daan Hoogland Date: Thu, 3 Apr 2014 15:52:06 +0200 Message-ID: Subject: Re: Review Request 19917: Improvements on exception handlers (JIRA-6242) To: Ding Yuan Cc: Alena Prokharchyk , dev Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Ding, I think we can dare to do so in master as it will not see release for a while. We'll just have to be aware of the locations and be on alert for any stacktraces that will pass this list. I would not like to do this on the 4.4 branch even when it is an improvement of code quality as such. It might do things or prevent things from happening that we need done. I don't see a new version of the diff in the review request. Did you 'Update' -> 'Upload Diff'? regards, Daan On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan wrote: > Uploaded a new patch to 19917. Changed the verbosity to debug, and addres= sed > Daan's comment on providing more distinctive text messages. > > Sorry that I haven't split them into smaller patches. > > Note in a few cases the original code was like: > try { > pstmt =3D txn.prepareAutoCloseStatement(sql); > String gmtCutTime =3D > DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime); > pstmt.setString(1, gmtCutTime); > pstmt.setString(2, gmtCutTime); > > ResultSet rs =3D pstmt.executeQuery(); > while (rs.next()) { > RunningHostCountInfo info =3D new RunningHostCountInfo(); > info.setDcId(rs.getLong(1)); > info.setHostType(rs.getString(2)); > info.setCount(rs.getInt(3)); > > l.add(info); > } > } catch (SQLException e) { > } catch (Throwable e) { > } > > The try block only throws SQLException as checked exception, and this cod= e > would also swallow any unchecked exceptions. I removed the catch (Throwab= le) > in these cases to avoid potentially swallowing any unexpected runtime > exceptions. Please let me know if this is not desirable so I can further > update. > > Thanks, > Ding > > On Apr 2, 2014, at 5:17 PM, Ding Yuan wrote: > > Thanks all for the quick comments! > If i understand the discussion correctly, I will just change all the adde= d > log printing statements to debug verbosity. I will upload a new patch for > that shortly. > > Now a bit back story: the reason we are doing this is that we recently di= d > an analysis on many bugs collected from JIRA to understand why today's cl= oud > system fails. And we found that almost all of the cluster-wide failures a= re > caused by buggy exception handling, which would often turn a component > failure into a cluster-wide one. One of the common bug pattern is ignorin= g > some exceptions -- allowing them to propagate and finally turn into disas= ter. > Therefore we built a simple static checking tool just to check some simpl= e > rules for exception handling, such as if an exception is ignored. > Admittedly, it would be much harder to reason about the potential > consequences caused for ignoring a certain exception, that's why without > much more domain knowledge I can only recommend to 1) avoid over-catching= an > exception, especially when the handling logic will swallow it, and 2) log > them, as what this patch does. > > Nevertheless, it seems the four cases I mentioned in JIRA-6242 are > particularly suspicious. It might be worthwhile to double check their > correctness if you have time. I am reposting them below. > > Thanks! > Ding > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Case 1: > Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java" > > 326: try { > 327: String myIp =3D getGreEndpointIP(dest.getHost(), nw); > .. .. .. > 373: } catch (Exception e) { > 374: // I really thing we should do a better handling of these exceptions > 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e); > 376: } > > Comment seems to suggest for a better handling. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Case 2: > Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java" > > 2284: try { > 2285: /* > 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't > 2287: * return if propagation return non null > 2288: */ > 2289: Boolean result =3D _clusterMgr.propagateResourceEvent(h.getId(), > ResourceState.Event.UpdatePassword); > 2290: if (result !=3D null) { > 2291: return result; > 2292: } > 2293: > 2294: doUpdateHostPassword(h.getId()); > 2295: } catch (AgentUnavailableException e) { > 2296: } > > Seem from the comment the logic should be fixed. > A similar code snipeet is at: > Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java" > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Case 3: > > Line: 184, File: > "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupC= md.java" > > 174: try > 175: { > 176: success =3D _autoScaleService.configureAutoScaleVmGroup(this); > 177: if (success) { > 178: vmGroup =3D _entityMgr.findById(AutoScaleVmGroup.class, getEntityId(= )); > 179: AutoScaleVmGroupResponse responseObject =3D > _responseGenerator.createAutoScaleVmGroupResponse(vmGroup); > 180: setResponseObject(responseObject); > 181: responseObject.setResponseName(getCommandName()); > 182: } > 183: } catch (Exception ex) { > 184: // TODO what will happen if Resource Layer fails in a step inbetween > 185: s_logger.warn("Failed to create autoscale vm group", ex); > 186: } > > The comment, "TODO", seems to suggest for better handling. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > Case 4: > > Line: 222, File: "com/cloud/api/ApiDispatcher.java" > This snippet is moved to ParamProcessWorker.java in the trunk. > > 203: try { > 204: setFieldValue(field, cmd, paramObj, parameterAnnotation); > .. .. > 220: } catch (CloudRuntimeException cloudEx) { > 221: s_logger.error("CloudRuntimeException", cloudEx); > 222: // FIXME: Better error message? This only happens if the API command= is > not executable, which typically > 223: //means > 224: // there was > 225: // and IllegalAccessException setting one of the parameters. > 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal > error executing API command " + cmd.getCommandName().substring(0, > cmd.getCommandName().length() - 8)); > 227: } > > The "FIXME" comment seems to suggest for getter error message. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > On Apr 2, 2014, at 4:37 PM, Daan Hoogland wrote= : > > Ding Yuan, > > Any objections to that? > > On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk > wrote: > > > > On 4/2/14, 1:27 PM, "Daan Hoogland" wrote: > > I think we agree indeed. Doesn't mean we should start this discuss > thread or write a arch guideline on the wiki somewhere. Maybe Ding > Yuan wants to do a preliminary version of it? > > > Wiki guide would be useful indeed. > > > > In the meantime I don't think that it hurts for the present patch to > do everything in debug and decide about higher levels needed later. > > > Agree. > > > > regards, > > On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk > wrote: > > Daan, > > Correct me if I=B9m wrong, but all of the logging added by Ding, fall > under > "to go with it or to indicate passing a certain code path=B2. I=B9ve just > noticed that some of them were added with DEBUG, and some with WARN > level, > and wanted to correct that. > > So we should: > > 1) For sure: never print them out in WARN as there is nothing admin > should > do in this case, because the code just handles them by ignoring. > 2) Figure out what would be the correct level to log them with: INFO or > DEBUG > > From =B3Logging best practices=B2 articles, I can see that people use INF= O > as > a =B3storyline=B2 of normal application behavior, and DEBUG for sort of > information that helps you to track down the failure cases scenarios. > To > me, stuff added by Ding, falls under second category. But I might be > wrong > as I don=B9t recall on the spot any discussions happening on the debug > topic, from the mailing list. > > -Alena. > > > On 4/2/14, 12:57 PM, "Daan Hoogland" wrote: > > Alena, > > What I read in your comment is a description of INFO vs WARN. Debug > would be only for outputting stacktraces to go with it or to indicate > passing a certain code path. > > Agree? > > On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk > wrote: > > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19917/#review39324 > ----------------------------------------------------------- > > > Is there a reason why logs for some exceptions are being logged in > DEBUG mode, and some in WARN? From my point of view, if the code only > catches it and doesn't error out, it should be logged in DEBUG. Lots of > Admins are seeking for WARN statements in the log, and they might be > confused seeing WARN w/o further failure or retry. > > - Alena Prokharchyk > > > On April 2, 2014, 1:55 p.m., Ding Yuan wrote: > > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19917/ > ----------------------------------------------------------- > > (Updated April 2, 2014, 1:55 p.m.) > > > Review request for cloudstack. > > > Repository: cloudstack-git > > > Description > ------- > > This is the patch for JIRA-6242. See > https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more > details. > Thanks! > > > Diffs > ----- > > > engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java > 0d41bc1 > > engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager > Im > pl.java 01508a4 > > engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java > 3e088db > > engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit > y/ > api/db/dao/EngineDataCenterDaoImpl.java 4b6818e > engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f > engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d > engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java > e42eaf4 > engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java > 34fdca5 > engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java > 58dd916 > engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0 > engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java > 1f382d6 > > engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM > an > agerImpl.java 6ed1274 > > framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla > ss > Registry.java 83c8a42 > > plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer > ve > rDiscoverer.java 0ad6dc4 > > plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe > rC > onnectionPool.java b779085 > > plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe > rS > torageProcessor.java e512046 > > plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat > as > tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a > server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba > server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8 > > services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT > hu > mbnailHandler.java 06f21d3 > utils/src/com/cloud/utils/net/NetUtils.java 6350986 > > Diff: https://reviews.apache.org/r/19917/diff/ > > > Testing > ------- > > > Thanks, > > Ding Yuan > > > > > > > -- > Daan > > > > > > -- > Daan > > > > > > -- > Daan > > > --=20 Daan