cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daan Hoogland <daan.hoogl...@gmail.com>
Subject Re: Review Request 19917: Improvements on exception handlers (JIRA-6242)
Date Thu, 03 Apr 2014 16:39:42 GMT
btw Ding,

You didn't add people to the review request you made. I was triggered
by something (must have been the title) I don't remember to look at it
but sometimes they lay around when no reviewer is assigned.

As I just stated in another thread: look at the annotations and/or git
log and see who did the most changes in the code, then add those
people as reviewers.

regards,
Daan

On Thu, Apr 3, 2014 at 6:04 PM, Daan Hoogland <daan.hoogland@gmail.com> wrote:
> thanks Ding,
>
> I saw your update. Did your run a cloud with this code; i.e. did you
> monkey-test it?
>
> On Thu, Apr 3, 2014 at 5:26 PM, Ding Yuan <yuan@ece.utoronto.ca> wrote:
>> Oops, sorry I didn't publish my diff. I just published it on review board.
>> Thanks for the comment Daan! Please let me know if I further need to improve it.
>>
>> Ding
>>
>> On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogland@gmail.com> wrote:
>>
>>> Ding,
>>>
>>> I think we can dare to do so in master as it will not see release for
>>> a while. We'll just have to be aware of the locations and be on alert
>>> for any stacktraces that will pass this list. I would not like to do
>>> this on the 4.4 branch even when it is an improvement of code quality
>>> as such. It might do things or prevent things from happening that we
>>> need done.
>>>
>>> I don't see a new version of the diff in the review request. Did you
>>> 'Update' -> 'Upload Diff'?
>>>
>>> regards,
>>> Daan
>>>
>>> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <yuan@ece.utoronto.ca> wrote:
>>>> Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed
>>>> Daan's comment on providing more distinctive text messages.
>>>>
>>>> Sorry that I haven't split them into smaller patches.
>>>>
>>>> Note in a few cases the original code was like:
>>>>         try {
>>>>            pstmt = txn.prepareAutoCloseStatement(sql);
>>>>            String gmtCutTime =
>>>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime);
>>>>            pstmt.setString(1, gmtCutTime);
>>>>            pstmt.setString(2, gmtCutTime);
>>>>
>>>>            ResultSet rs = pstmt.executeQuery();
>>>>            while (rs.next()) {
>>>>                RunningHostCountInfo info = new RunningHostCountInfo();
>>>>                info.setDcId(rs.getLong(1));
>>>>                info.setHostType(rs.getString(2));
>>>>                info.setCount(rs.getInt(3));
>>>>
>>>>                l.add(info);
>>>>                       }
>>>>                 } catch (SQLException e) {
>>>>                 } catch (Throwable e) {
>>>>                 }
>>>>
>>>> The try block only throws SQLException as checked exception, and this code
>>>> would also swallow any unchecked exceptions. I removed the catch (Throwable)
>>>> in these cases to avoid potentially swallowing any unexpected runtime
>>>> exceptions. Please let me know if this is not desirable so I can further
>>>> update.
>>>>
>>>> Thanks,
>>>> Ding
>>>>
>>>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <yuan@eecg.toronto.edu> wrote:
>>>>
>>>> Thanks all for the quick comments!
>>>> If i understand the discussion correctly, I will just change all the added
>>>> log printing statements to debug verbosity. I will upload a new patch for
>>>> that shortly.
>>>>
>>>> Now a bit back story: the reason we are doing this is that we recently did
>>>> an analysis on many bugs collected from JIRA to understand why today's cloud
>>>> system fails. And we found that almost all of the cluster-wide failures are
>>>> caused by buggy exception handling, which would often turn a component
>>>> failure into a cluster-wide one. One of the common bug pattern is ignoring
>>>> some exceptions -- allowing them to propagate and finally turn into disaster.
>>>> Therefore we built a simple static checking tool just to check some simple
>>>> rules for exception handling, such as if an exception is ignored.
>>>> Admittedly, it would be much harder to reason about the potential
>>>> consequences caused for ignoring a certain exception, that's why without
>>>> much more domain knowledge I can only recommend to 1) avoid over-catching
an
>>>> exception, especially when the handling logic will swallow it, and 2) log
>>>> them, as what this patch does.
>>>>
>>>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are
>>>> particularly suspicious. It might be worthwhile to double check their
>>>> correctness if you have time. I am reposting them below.
>>>>
>>>> Thanks!
>>>> Ding
>>>>
>>>> =========================
>>>> Case 1:
>>>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java"
>>>>
>>>> 326: try {
>>>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw);
>>>>  .. .. ..
>>>> 373: } catch (Exception e) {
>>>> 374: // I really thing we should do a better handling of these exceptions
>>>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e);
>>>> 376: }
>>>>
>>>> Comment seems to suggest for a better handling.
>>>> =========================
>>>> =========================
>>>> Case 2:
>>>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java"
>>>>
>>>> 2284: try {
>>>> 2285: /*
>>>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't
>>>> 2287: * return if propagation return non null
>>>> 2288: */
>>>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(),
>>>> ResourceState.Event.UpdatePassword);
>>>> 2290: if (result != null) {
>>>> 2291: return result;
>>>> 2292: }
>>>> 2293:
>>>> 2294: doUpdateHostPassword(h.getId());
>>>> 2295: } catch (AgentUnavailableException e) {
>>>> 2296: }
>>>>
>>>> Seem from the comment the logic should be fixed.
>>>> A similar code snipeet is at:
>>>>  Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java"
>>>> =========================
>>>>
>>>> =========================
>>>> Case 3:
>>>>
>>>>  Line: 184, File:
>>>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java"
>>>>
>>>> 174: try
>>>> 175: {
>>>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this);
>>>> 177: if (success) {
>>>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId());
>>>> 179: AutoScaleVmGroupResponse responseObject =
>>>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup);
>>>> 180: setResponseObject(responseObject);
>>>> 181: responseObject.setResponseName(getCommandName());
>>>> 182: }
>>>> 183: } catch (Exception ex) {
>>>> 184: // TODO what will happen if Resource Layer fails in a step inbetween
>>>> 185: s_logger.warn("Failed to create autoscale vm group", ex);
>>>> 186: }
>>>>
>>>> The comment, "TODO", seems to suggest for better handling.
>>>> =========================
>>>> =========================
>>>> Case 4:
>>>>
>>>>  Line: 222, File: "com/cloud/api/ApiDispatcher.java"
>>>> This snippet is moved to ParamProcessWorker.java in the trunk.
>>>>
>>>> 203: try {
>>>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation);
>>>>     .. ..
>>>> 220: } catch (CloudRuntimeException cloudEx) {
>>>> 221: s_logger.error("CloudRuntimeException", cloudEx);
>>>> 222: // FIXME: Better error message? This only happens if the API command
is
>>>> not executable, which typically
>>>> 223: //means
>>>> 224: // there was
>>>> 225: // and IllegalAccessException setting one of the parameters.
>>>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal
>>>> error executing API command " + cmd.getCommandName().substring(0,
>>>> cmd.getCommandName().length() - 8));
>>>> 227: }
>>>>
>>>> The "FIXME" comment seems to suggest for getter error message.
>>>> =========================
>>>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogland@gmail.com>
wrote:
>>>>
>>>> Ding Yuan,
>>>>
>>>> Any objections to that?
>>>>
>>>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk
>>>> <Alena.Prokharchyk@citrix.com> wrote:
>>>>
>>>>
>>>>
>>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogland@gmail.com> wrote:
>>>>
>>>> I think we agree indeed. Doesn't mean we should start this discuss
>>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding
>>>> Yuan wants to do a preliminary version of it?
>>>>
>>>>
>>>> Wiki guide would be useful indeed.
>>>>
>>>>
>>>>
>>>> In the meantime I don't think that it hurts for the present patch to
>>>> do everything in debug and decide about higher levels needed later.
>>>>
>>>>
>>>> Agree.
>>>>
>>>>
>>>>
>>>> regards,
>>>>
>>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk
>>>> <Alena.Prokharchyk@citrix.com> wrote:
>>>>
>>>> Daan,
>>>>
>>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall
>>>> under
>>>> "to go with it or to indicate passing a certain code path². I¹ve just
>>>> noticed that some of them were added with DEBUG, and some with WARN
>>>> level,
>>>> and wanted to correct that.
>>>>
>>>> So we should:
>>>>
>>>> 1) For sure: never print them out in WARN as there is nothing admin
>>>> should
>>>> do in this case, because the code just handles them by ignoring.
>>>> 2) Figure out what would be the correct level to log them with: INFO or
>>>> DEBUG
>>>>
>>>> From ³Logging best practices² articles, I can see that people use INFO
>>>> as
>>>> a ³storyline² of normal application behavior, and DEBUG for sort of
>>>> information that helps you to track down the failure cases scenarios.
>>>> To
>>>> me, stuff added by Ding, falls under second category. But I might be
>>>> wrong
>>>> as I don¹t recall on the spot any discussions happening on the debug
>>>> topic, from the mailing list.
>>>>
>>>> -Alena.
>>>>
>>>>
>>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogland@gmail.com> wrote:
>>>>
>>>> Alena,
>>>>
>>>> What I read in your comment is a description of INFO vs WARN. Debug
>>>> would be only for outputting stacktraces to go with it or to indicate
>>>> passing a certain code path.
>>>>
>>>> Agree?
>>>>
>>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk
>>>> <alena.prokharchyk@citrix.com> wrote:
>>>>
>>>>
>>>> -----------------------------------------------------------
>>>> This is an automatically generated e-mail. To reply, visit:
>>>> https://reviews.apache.org/r/19917/#review39324
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> Is there a reason why logs for some exceptions are being logged in
>>>> DEBUG mode, and some in WARN? From my point of view, if the code only
>>>> catches it and doesn't error out, it should be logged in DEBUG. Lots of
>>>> Admins are seeking for WARN statements in the log, and they might be
>>>> confused seeing WARN w/o further failure or retry.
>>>>
>>>> - Alena Prokharchyk
>>>>
>>>>
>>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote:
>>>>
>>>>
>>>> -----------------------------------------------------------
>>>> This is an automatically generated e-mail. To reply, visit:
>>>> https://reviews.apache.org/r/19917/
>>>> -----------------------------------------------------------
>>>>
>>>> (Updated April 2, 2014, 1:55 p.m.)
>>>>
>>>>
>>>> Review request for cloudstack.
>>>>
>>>>
>>>> Repository: cloudstack-git
>>>>
>>>>
>>>> Description
>>>> -------
>>>>
>>>> This is the patch for JIRA-6242. See
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more
>>>> details.
>>>> Thanks!
>>>>
>>>>
>>>> Diffs
>>>> -----
>>>>
>>>>
>>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
>>>> 0d41bc1
>>>>
>>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager
>>>> Im
>>>> pl.java 01508a4
>>>>
>>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>> 3e088db
>>>>
>>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit
>>>> y/
>>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e
>>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f
>>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d
>>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java
>>>> e42eaf4
>>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java
>>>> 34fdca5
>>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java
>>>> 58dd916
>>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0
>>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java
>>>> 1f382d6
>>>>
>>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM
>>>> an
>>>> agerImpl.java 6ed1274
>>>>
>>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla
>>>> ss
>>>> Registry.java 83c8a42
>>>>
>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer
>>>> ve
>>>> rDiscoverer.java 0ad6dc4
>>>>
>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>> rC
>>>> onnectionPool.java b779085
>>>>
>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>> rS
>>>> torageProcessor.java e512046
>>>>
>>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat
>>>> as
>>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a
>>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba
>>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8
>>>>
>>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT
>>>> hu
>>>> mbnailHandler.java 06f21d3
>>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986
>>>>
>>>> Diff: https://reviews.apache.org/r/19917/diff/
>>>>
>>>>
>>>> Testing
>>>> -------
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Ding Yuan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Daan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Daan
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Daan
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Daan
>>>
>>
>
>
>
> --
> Daan



-- 
Daan

Mime
View raw message