cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ding Yuan <y...@ece.utoronto.ca>
Subject Re: Review Request 19917: Improvements on exception handlers (JIRA-6242)
Date Thu, 03 Apr 2014 15:26:30 GMT
Oops, sorry I didn’t publish my diff. I just published it on review board. 
Thanks for the comment Daan! Please let me know if I further need to improve it.

Ding

On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogland@gmail.com> wrote:

> Ding,
> 
> I think we can dare to do so in master as it will not see release for
> a while. We'll just have to be aware of the locations and be on alert
> for any stacktraces that will pass this list. I would not like to do
> this on the 4.4 branch even when it is an improvement of code quality
> as such. It might do things or prevent things from happening that we
> need done.
> 
> I don't see a new version of the diff in the review request. Did you
> 'Update' -> 'Upload Diff'?
> 
> regards,
> Daan
> 
> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <yuan@ece.utoronto.ca> wrote:
>> Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed
>> Daan's comment on providing more distinctive text messages.
>> 
>> Sorry that I haven't split them into smaller patches.
>> 
>> Note in a few cases the original code was like:
>>         try {
>>            pstmt = txn.prepareAutoCloseStatement(sql);
>>            String gmtCutTime =
>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime);
>>            pstmt.setString(1, gmtCutTime);
>>            pstmt.setString(2, gmtCutTime);
>> 
>>            ResultSet rs = pstmt.executeQuery();
>>            while (rs.next()) {
>>                RunningHostCountInfo info = new RunningHostCountInfo();
>>                info.setDcId(rs.getLong(1));
>>                info.setHostType(rs.getString(2));
>>                info.setCount(rs.getInt(3));
>> 
>>                l.add(info);
>>                       }
>>                 } catch (SQLException e) {
>>                 } catch (Throwable e) {
>>                 }
>> 
>> The try block only throws SQLException as checked exception, and this code
>> would also swallow any unchecked exceptions. I removed the catch (Throwable)
>> in these cases to avoid potentially swallowing any unexpected runtime
>> exceptions. Please let me know if this is not desirable so I can further
>> update.
>> 
>> Thanks,
>> Ding
>> 
>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <yuan@eecg.toronto.edu> wrote:
>> 
>> Thanks all for the quick comments!
>> If i understand the discussion correctly, I will just change all the added
>> log printing statements to debug verbosity. I will upload a new patch for
>> that shortly.
>> 
>> Now a bit back story: the reason we are doing this is that we recently did
>> an analysis on many bugs collected from JIRA to understand why today's cloud
>> system fails. And we found that almost all of the cluster-wide failures are
>> caused by buggy exception handling, which would often turn a component
>> failure into a cluster-wide one. One of the common bug pattern is ignoring
>> some exceptions -- allowing them to propagate and finally turn into disaster.
>> Therefore we built a simple static checking tool just to check some simple
>> rules for exception handling, such as if an exception is ignored.
>> Admittedly, it would be much harder to reason about the potential
>> consequences caused for ignoring a certain exception, that's why without
>> much more domain knowledge I can only recommend to 1) avoid over-catching an
>> exception, especially when the handling logic will swallow it, and 2) log
>> them, as what this patch does.
>> 
>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are
>> particularly suspicious. It might be worthwhile to double check their
>> correctness if you have time. I am reposting them below.
>> 
>> Thanks!
>> Ding
>> 
>> =========================
>> Case 1:
>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java"
>> 
>> 326: try {
>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw);
>>  .. .. ..
>> 373: } catch (Exception e) {
>> 374: // I really thing we should do a better handling of these exceptions
>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e);
>> 376: }
>> 
>> Comment seems to suggest for a better handling.
>> =========================
>> =========================
>> Case 2:
>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java"
>> 
>> 2284: try {
>> 2285: /*
>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't
>> 2287: * return if propagation return non null
>> 2288: */
>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(),
>> ResourceState.Event.UpdatePassword);
>> 2290: if (result != null) {
>> 2291: return result;
>> 2292: }
>> 2293:
>> 2294: doUpdateHostPassword(h.getId());
>> 2295: } catch (AgentUnavailableException e) {
>> 2296: }
>> 
>> Seem from the comment the logic should be fixed.
>> A similar code snipeet is at:
>>  Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java"
>> =========================
>> 
>> =========================
>> Case 3:
>> 
>>  Line: 184, File:
>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java"
>> 
>> 174: try
>> 175: {
>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this);
>> 177: if (success) {
>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId());
>> 179: AutoScaleVmGroupResponse responseObject =
>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup);
>> 180: setResponseObject(responseObject);
>> 181: responseObject.setResponseName(getCommandName());
>> 182: }
>> 183: } catch (Exception ex) {
>> 184: // TODO what will happen if Resource Layer fails in a step inbetween
>> 185: s_logger.warn("Failed to create autoscale vm group", ex);
>> 186: }
>> 
>> The comment, "TODO", seems to suggest for better handling.
>> =========================
>> =========================
>> Case 4:
>> 
>>  Line: 222, File: "com/cloud/api/ApiDispatcher.java"
>> This snippet is moved to ParamProcessWorker.java in the trunk.
>> 
>> 203: try {
>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation);
>>     .. ..
>> 220: } catch (CloudRuntimeException cloudEx) {
>> 221: s_logger.error("CloudRuntimeException", cloudEx);
>> 222: // FIXME: Better error message? This only happens if the API command is
>> not executable, which typically
>> 223: //means
>> 224: // there was
>> 225: // and IllegalAccessException setting one of the parameters.
>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal
>> error executing API command " + cmd.getCommandName().substring(0,
>> cmd.getCommandName().length() - 8));
>> 227: }
>> 
>> The "FIXME" comment seems to suggest for getter error message.
>> =========================
>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogland@gmail.com> wrote:
>> 
>> Ding Yuan,
>> 
>> Any objections to that?
>> 
>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk
>> <Alena.Prokharchyk@citrix.com> wrote:
>> 
>> 
>> 
>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogland@gmail.com> wrote:
>> 
>> I think we agree indeed. Doesn't mean we should start this discuss
>> thread or write a arch guideline on the wiki somewhere. Maybe Ding
>> Yuan wants to do a preliminary version of it?
>> 
>> 
>> Wiki guide would be useful indeed.
>> 
>> 
>> 
>> In the meantime I don't think that it hurts for the present patch to
>> do everything in debug and decide about higher levels needed later.
>> 
>> 
>> Agree.
>> 
>> 
>> 
>> regards,
>> 
>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk
>> <Alena.Prokharchyk@citrix.com> wrote:
>> 
>> Daan,
>> 
>> Correct me if I¹m wrong, but all of the logging added by Ding, fall
>> under
>> "to go with it or to indicate passing a certain code path². I¹ve just
>> noticed that some of them were added with DEBUG, and some with WARN
>> level,
>> and wanted to correct that.
>> 
>> So we should:
>> 
>> 1) For sure: never print them out in WARN as there is nothing admin
>> should
>> do in this case, because the code just handles them by ignoring.
>> 2) Figure out what would be the correct level to log them with: INFO or
>> DEBUG
>> 
>> From ³Logging best practices² articles, I can see that people use INFO
>> as
>> a ³storyline² of normal application behavior, and DEBUG for sort of
>> information that helps you to track down the failure cases scenarios.
>> To
>> me, stuff added by Ding, falls under second category. But I might be
>> wrong
>> as I don¹t recall on the spot any discussions happening on the debug
>> topic, from the mailing list.
>> 
>> -Alena.
>> 
>> 
>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogland@gmail.com> wrote:
>> 
>> Alena,
>> 
>> What I read in your comment is a description of INFO vs WARN. Debug
>> would be only for outputting stacktraces to go with it or to indicate
>> passing a certain code path.
>> 
>> Agree?
>> 
>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk
>> <alena.prokharchyk@citrix.com> wrote:
>> 
>> 
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/19917/#review39324
>> -----------------------------------------------------------
>> 
>> 
>> Is there a reason why logs for some exceptions are being logged in
>> DEBUG mode, and some in WARN? From my point of view, if the code only
>> catches it and doesn't error out, it should be logged in DEBUG. Lots of
>> Admins are seeking for WARN statements in the log, and they might be
>> confused seeing WARN w/o further failure or retry.
>> 
>> - Alena Prokharchyk
>> 
>> 
>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote:
>> 
>> 
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/19917/
>> -----------------------------------------------------------
>> 
>> (Updated April 2, 2014, 1:55 p.m.)
>> 
>> 
>> Review request for cloudstack.
>> 
>> 
>> Repository: cloudstack-git
>> 
>> 
>> Description
>> -------
>> 
>> This is the patch for JIRA-6242. See
>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more
>> details.
>> Thanks!
>> 
>> 
>> Diffs
>> -----
>> 
>> 
>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
>> 0d41bc1
>> 
>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager
>> Im
>> pl.java 01508a4
>> 
>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
>> 3e088db
>> 
>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit
>> y/
>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e
>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f
>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d
>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java
>> e42eaf4
>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java
>> 34fdca5
>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java
>> 58dd916
>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0
>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java
>> 1f382d6
>> 
>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM
>> an
>> agerImpl.java 6ed1274
>> 
>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla
>> ss
>> Registry.java 83c8a42
>> 
>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer
>> ve
>> rDiscoverer.java 0ad6dc4
>> 
>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>> rC
>> onnectionPool.java b779085
>> 
>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>> rS
>> torageProcessor.java e512046
>> 
>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat
>> as
>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a
>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba
>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8
>> 
>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT
>> hu
>> mbnailHandler.java 06f21d3
>> utils/src/com/cloud/utils/net/NetUtils.java 6350986
>> 
>> Diff: https://reviews.apache.org/r/19917/diff/
>> 
>> 
>> Testing
>> -------
>> 
>> 
>> Thanks,
>> 
>> Ding Yuan
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Daan
>> 
>> 
>> 
>> 
>> 
>> --
>> Daan
>> 
>> 
>> 
>> 
>> 
>> --
>> Daan
>> 
>> 
>> 
> 
> 
> 
> -- 
> Daan
> 


Mime
View raw message