cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daan Hoogland <daan.hoogl...@gmail.com>
Subject Re: Review Request 19917: Improvements on exception handlers (JIRA-6242)
Date Thu, 03 Apr 2014 13:52:06 GMT
Ding,

I think we can dare to do so in master as it will not see release for
a while. We'll just have to be aware of the locations and be on alert
for any stacktraces that will pass this list. I would not like to do
this on the 4.4 branch even when it is an improvement of code quality
as such. It might do things or prevent things from happening that we
need done.

I don't see a new version of the diff in the review request. Did you
'Update' -> 'Upload Diff'?

regards,
Daan

On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <yuan@ece.utoronto.ca> wrote:
> Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed
> Daan's comment on providing more distinctive text messages.
>
> Sorry that I haven't split them into smaller patches.
>
> Note in a few cases the original code was like:
>          try {
>             pstmt = txn.prepareAutoCloseStatement(sql);
>             String gmtCutTime =
> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime);
>             pstmt.setString(1, gmtCutTime);
>             pstmt.setString(2, gmtCutTime);
>
>             ResultSet rs = pstmt.executeQuery();
>             while (rs.next()) {
>                 RunningHostCountInfo info = new RunningHostCountInfo();
>                 info.setDcId(rs.getLong(1));
>                 info.setHostType(rs.getString(2));
>                 info.setCount(rs.getInt(3));
>
>                 l.add(info);
>                        }
>                  } catch (SQLException e) {
>                  } catch (Throwable e) {
>                  }
>
> The try block only throws SQLException as checked exception, and this code
> would also swallow any unchecked exceptions. I removed the catch (Throwable)
> in these cases to avoid potentially swallowing any unexpected runtime
> exceptions. Please let me know if this is not desirable so I can further
> update.
>
> Thanks,
> Ding
>
> On Apr 2, 2014, at 5:17 PM, Ding Yuan <yuan@eecg.toronto.edu> wrote:
>
> Thanks all for the quick comments!
> If i understand the discussion correctly, I will just change all the added
> log printing statements to debug verbosity. I will upload a new patch for
> that shortly.
>
> Now a bit back story: the reason we are doing this is that we recently did
> an analysis on many bugs collected from JIRA to understand why today's cloud
> system fails. And we found that almost all of the cluster-wide failures are
> caused by buggy exception handling, which would often turn a component
> failure into a cluster-wide one. One of the common bug pattern is ignoring
> some exceptions -- allowing them to propagate and finally turn into disaster.
> Therefore we built a simple static checking tool just to check some simple
> rules for exception handling, such as if an exception is ignored.
> Admittedly, it would be much harder to reason about the potential
> consequences caused for ignoring a certain exception, that's why without
> much more domain knowledge I can only recommend to 1) avoid over-catching an
> exception, especially when the handling logic will swallow it, and 2) log
> them, as what this patch does.
>
> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are
> particularly suspicious. It might be worthwhile to double check their
> correctness if you have time. I am reposting them below.
>
> Thanks!
> Ding
>
> =========================
> Case 1:
> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java"
>
> 326: try {
> 327: String myIp = getGreEndpointIP(dest.getHost(), nw);
>   .. .. ..
> 373: } catch (Exception e) {
> 374: // I really thing we should do a better handling of these exceptions
> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e);
> 376: }
>
> Comment seems to suggest for a better handling.
> =========================
> =========================
> Case 2:
> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java"
>
> 2284: try {
> 2285: /*
> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't
> 2287: * return if propagation return non null
> 2288: */
> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(),
> ResourceState.Event.UpdatePassword);
> 2290: if (result != null) {
> 2291: return result;
> 2292: }
> 2293:
> 2294: doUpdateHostPassword(h.getId());
> 2295: } catch (AgentUnavailableException e) {
> 2296: }
>
> Seem from the comment the logic should be fixed.
> A similar code snipeet is at:
>   Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java"
> =========================
>
> =========================
> Case 3:
>
>   Line: 184, File:
> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java"
>
> 174: try
> 175: {
> 176: success = _autoScaleService.configureAutoScaleVmGroup(this);
> 177: if (success) {
> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId());
> 179: AutoScaleVmGroupResponse responseObject =
> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup);
> 180: setResponseObject(responseObject);
> 181: responseObject.setResponseName(getCommandName());
> 182: }
> 183: } catch (Exception ex) {
> 184: // TODO what will happen if Resource Layer fails in a step inbetween
> 185: s_logger.warn("Failed to create autoscale vm group", ex);
> 186: }
>
> The comment, "TODO", seems to suggest for better handling.
> =========================
> =========================
> Case 4:
>
>   Line: 222, File: "com/cloud/api/ApiDispatcher.java"
> This snippet is moved to ParamProcessWorker.java in the trunk.
>
> 203: try {
> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation);
>      .. ..
> 220: } catch (CloudRuntimeException cloudEx) {
> 221: s_logger.error("CloudRuntimeException", cloudEx);
> 222: // FIXME: Better error message? This only happens if the API command is
> not executable, which typically
> 223: //means
> 224: // there was
> 225: // and IllegalAccessException setting one of the parameters.
> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal
> error executing API command " + cmd.getCommandName().substring(0,
> cmd.getCommandName().length() - 8));
> 227: }
>
> The "FIXME" comment seems to suggest for getter error message.
> =========================
> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogland@gmail.com> wrote:
>
> Ding Yuan,
>
> Any objections to that?
>
> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk
> <Alena.Prokharchyk@citrix.com> wrote:
>
>
>
> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogland@gmail.com> wrote:
>
> I think we agree indeed. Doesn't mean we should start this discuss
> thread or write a arch guideline on the wiki somewhere. Maybe Ding
> Yuan wants to do a preliminary version of it?
>
>
> Wiki guide would be useful indeed.
>
>
>
> In the meantime I don't think that it hurts for the present patch to
> do everything in debug and decide about higher levels needed later.
>
>
> Agree.
>
>
>
> regards,
>
> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk
> <Alena.Prokharchyk@citrix.com> wrote:
>
> Daan,
>
> Correct me if I¹m wrong, but all of the logging added by Ding, fall
> under
> "to go with it or to indicate passing a certain code path². I¹ve just
> noticed that some of them were added with DEBUG, and some with WARN
> level,
> and wanted to correct that.
>
> So we should:
>
> 1) For sure: never print them out in WARN as there is nothing admin
> should
> do in this case, because the code just handles them by ignoring.
> 2) Figure out what would be the correct level to log them with: INFO or
> DEBUG
>
> From ³Logging best practices² articles, I can see that people use INFO
> as
> a ³storyline² of normal application behavior, and DEBUG for sort of
> information that helps you to track down the failure cases scenarios.
> To
> me, stuff added by Ding, falls under second category. But I might be
> wrong
> as I don¹t recall on the spot any discussions happening on the debug
> topic, from the mailing list.
>
> -Alena.
>
>
> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogland@gmail.com> wrote:
>
> Alena,
>
> What I read in your comment is a description of INFO vs WARN. Debug
> would be only for outputting stacktraces to go with it or to indicate
> passing a certain code path.
>
> Agree?
>
> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk
> <alena.prokharchyk@citrix.com> wrote:
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19917/#review39324
> -----------------------------------------------------------
>
>
> Is there a reason why logs for some exceptions are being logged in
> DEBUG mode, and some in WARN? From my point of view, if the code only
> catches it and doesn't error out, it should be logged in DEBUG. Lots of
> Admins are seeking for WARN statements in the log, and they might be
> confused seeing WARN w/o further failure or retry.
>
> - Alena Prokharchyk
>
>
> On April 2, 2014, 1:55 p.m., Ding Yuan wrote:
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19917/
> -----------------------------------------------------------
>
> (Updated April 2, 2014, 1:55 p.m.)
>
>
> Review request for cloudstack.
>
>
> Repository: cloudstack-git
>
>
> Description
> -------
>
> This is the patch for JIRA-6242. See
> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more
> details.
> Thanks!
>
>
> Diffs
> -----
>
>
> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
> 0d41bc1
>
> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager
> Im
> pl.java 01508a4
>
> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
> 3e088db
>
> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit
> y/
> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e
>  engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f
>  engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d
>  engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java
> e42eaf4
>  engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java
> 34fdca5
>  engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java
> 58dd916
>  engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0
>  engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java
> 1f382d6
>
> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM
> an
> agerImpl.java 6ed1274
>
> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla
> ss
> Registry.java 83c8a42
>
> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer
> ve
> rDiscoverer.java 0ad6dc4
>
> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
> rC
> onnectionPool.java b779085
>
> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
> rS
> torageProcessor.java e512046
>
> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat
> as
> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a
>  server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba
>  server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8
>
> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT
> hu
> mbnailHandler.java 06f21d3
>  utils/src/com/cloud/utils/net/NetUtils.java 6350986
>
> Diff: https://reviews.apache.org/r/19917/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Ding Yuan
>
>
>
>
>
>
> --
> Daan
>
>
>
>
>
> --
> Daan
>
>
>
>
>
> --
> Daan
>
>
>



-- 
Daan

Mime
View raw message