cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daan Hoogland <daan.hoogl...@gmail.com>
Subject Re: Review Request 19917: Improvements on exception handlers (JIRA-6242)
Date Sat, 05 Apr 2014 09:02:40 GMT
H Ding,

I didn't mean for you to test it in a certain way. I was just curious
about what you did with the patch before you submitted it. Did you
start a cloud instance with it and try to hit any of the log messages
for instance. You explained the background of your effort and I am
curious as to how it satisfied your objectives.

I will let the patch rest a few days to see if we get any more
reactions and apply it from Denver next week.

On Sat, Apr 5, 2014 at 2:03 AM, Ding Yuan <yuan@ece.utoronto.ca> wrote:
> HI Daan,
> I am not sure exactly how to monkey-test cloudstack, what I did was to do
> $ mvn test
> which shows "BUILD SUCCESS". I also did this:
> $ mvn clean install -P systemvm,developer
> which also succeeded.
>
> Is that what you mean? If not, please let me know what to do and I will further test
it.
> I will also work on assigning the proper reviewers now.
>
> Thanks,
> Ding
>
> On Apr 3, 2014, at 12:04 PM, Daan Hoogland <daan.hoogland@gmail.com> wrote:
>
>> thanks Ding,
>>
>> I saw your update. Did your run a cloud with this code; i.e. did you
>> monkey-test it?
>>
>> On Thu, Apr 3, 2014 at 5:26 PM, Ding Yuan <yuan@ece.utoronto.ca> wrote:
>>> Oops, sorry I didn't publish my diff. I just published it on review board.
>>> Thanks for the comment Daan! Please let me know if I further need to improve
it.
>>>
>>> Ding
>>>
>>> On Apr 3, 2014, at 9:52 AM, Daan Hoogland <daan.hoogland@gmail.com> wrote:
>>>
>>>> Ding,
>>>>
>>>> I think we can dare to do so in master as it will not see release for
>>>> a while. We'll just have to be aware of the locations and be on alert
>>>> for any stacktraces that will pass this list. I would not like to do
>>>> this on the 4.4 branch even when it is an improvement of code quality
>>>> as such. It might do things or prevent things from happening that we
>>>> need done.
>>>>
>>>> I don't see a new version of the diff in the review request. Did you
>>>> 'Update' -> 'Upload Diff'?
>>>>
>>>> regards,
>>>> Daan
>>>>
>>>> On Thu, Apr 3, 2014 at 12:34 AM, Ding Yuan <yuan@ece.utoronto.ca> wrote:
>>>>> Uploaded a new patch to 19917. Changed the verbosity to debug, and addressed
>>>>> Daan's comment on providing more distinctive text messages.
>>>>>
>>>>> Sorry that I haven't split them into smaller patches.
>>>>>
>>>>> Note in a few cases the original code was like:
>>>>>        try {
>>>>>           pstmt = txn.prepareAutoCloseStatement(sql);
>>>>>           String gmtCutTime =
>>>>> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), cutTime);
>>>>>           pstmt.setString(1, gmtCutTime);
>>>>>           pstmt.setString(2, gmtCutTime);
>>>>>
>>>>>           ResultSet rs = pstmt.executeQuery();
>>>>>           while (rs.next()) {
>>>>>               RunningHostCountInfo info = new RunningHostCountInfo();
>>>>>               info.setDcId(rs.getLong(1));
>>>>>               info.setHostType(rs.getString(2));
>>>>>               info.setCount(rs.getInt(3));
>>>>>
>>>>>               l.add(info);
>>>>>                      }
>>>>>                } catch (SQLException e) {
>>>>>                } catch (Throwable e) {
>>>>>                }
>>>>>
>>>>> The try block only throws SQLException as checked exception, and this
code
>>>>> would also swallow any unchecked exceptions. I removed the catch (Throwable)
>>>>> in these cases to avoid potentially swallowing any unexpected runtime
>>>>> exceptions. Please let me know if this is not desirable so I can further
>>>>> update.
>>>>>
>>>>> Thanks,
>>>>> Ding
>>>>>
>>>>> On Apr 2, 2014, at 5:17 PM, Ding Yuan <yuan@eecg.toronto.edu> wrote:
>>>>>
>>>>> Thanks all for the quick comments!
>>>>> If i understand the discussion correctly, I will just change all the
added
>>>>> log printing statements to debug verbosity. I will upload a new patch
for
>>>>> that shortly.
>>>>>
>>>>> Now a bit back story: the reason we are doing this is that we recently
did
>>>>> an analysis on many bugs collected from JIRA to understand why today's
cloud
>>>>> system fails. And we found that almost all of the cluster-wide failures
are
>>>>> caused by buggy exception handling, which would often turn a component
>>>>> failure into a cluster-wide one. One of the common bug pattern is ignoring
>>>>> some exceptions -- allowing them to propagate and finally turn into disaster.
>>>>> Therefore we built a simple static checking tool just to check some simple
>>>>> rules for exception handling, such as if an exception is ignored.
>>>>> Admittedly, it would be much harder to reason about the potential
>>>>> consequences caused for ignoring a certain exception, that's why without
>>>>> much more domain knowledge I can only recommend to 1) avoid over-catching
an
>>>>> exception, especially when the handling logic will swallow it, and 2)
log
>>>>> them, as what this patch does.
>>>>>
>>>>> Nevertheless, it seems the four cases I mentioned in JIRA-6242 are
>>>>> particularly suspicious. It might be worthwhile to double check their
>>>>> correctness if you have time. I am reposting them below.
>>>>>
>>>>> Thanks!
>>>>> Ding
>>>>>
>>>>> =========================
>>>>> Case 1:
>>>>> Line: 375, File: "com/cloud/network/ovs/OvsTunnelManagerImpl.java"
>>>>>
>>>>> 326: try {
>>>>> 327: String myIp = getGreEndpointIP(dest.getHost(), nw);
>>>>> .. .. ..
>>>>> 373: } catch (Exception e) {
>>>>> 374: // I really thing we should do a better handling of these exceptions
>>>>> 375: s_logger.warn("Ovs Tunnel network created tunnel failed", e);
>>>>> 376: }
>>>>>
>>>>> Comment seems to suggest for a better handling.
>>>>> =========================
>>>>> =========================
>>>>> Case 2:
>>>>> Line: 2295, File: "com/cloud/resource/ResourceManagerImpl.java"
>>>>>
>>>>> 2284: try {
>>>>> 2285: /*
>>>>> 2286: * FIXME: this is a buggy logic, check with alex. Shouldn't
>>>>> 2287: * return if propagation return non null
>>>>> 2288: */
>>>>> 2289: Boolean result = _clusterMgr.propagateResourceEvent(h.getId(),
>>>>> ResourceState.Event.UpdatePassword);
>>>>> 2290: if (result != null) {
>>>>> 2291: return result;
>>>>> 2292: }
>>>>> 2293:
>>>>> 2294: doUpdateHostPassword(h.getId());
>>>>> 2295: } catch (AgentUnavailableException e) {
>>>>> 2296: }
>>>>>
>>>>> Seem from the comment the logic should be fixed.
>>>>> A similar code snipeet is at:
>>>>> Line: 2276, File: "com/cloud/resource/ResourceManagerImpl.java"
>>>>> =========================
>>>>>
>>>>> =========================
>>>>> Case 3:
>>>>>
>>>>> Line: 184, File:
>>>>> "org/apache/cloudstack/api/command/user/autoscale/CreateAutoScaleVmGroupCmd.java"
>>>>>
>>>>> 174: try
>>>>> 175: {
>>>>> 176: success = _autoScaleService.configureAutoScaleVmGroup(this);
>>>>> 177: if (success) {
>>>>> 178: vmGroup = _entityMgr.findById(AutoScaleVmGroup.class, getEntityId());
>>>>> 179: AutoScaleVmGroupResponse responseObject =
>>>>> _responseGenerator.createAutoScaleVmGroupResponse(vmGroup);
>>>>> 180: setResponseObject(responseObject);
>>>>> 181: responseObject.setResponseName(getCommandName());
>>>>> 182: }
>>>>> 183: } catch (Exception ex) {
>>>>> 184: // TODO what will happen if Resource Layer fails in a step inbetween
>>>>> 185: s_logger.warn("Failed to create autoscale vm group", ex);
>>>>> 186: }
>>>>>
>>>>> The comment, "TODO", seems to suggest for better handling.
>>>>> =========================
>>>>> =========================
>>>>> Case 4:
>>>>>
>>>>> Line: 222, File: "com/cloud/api/ApiDispatcher.java"
>>>>> This snippet is moved to ParamProcessWorker.java in the trunk.
>>>>>
>>>>> 203: try {
>>>>> 204: setFieldValue(field, cmd, paramObj, parameterAnnotation);
>>>>>    .. ..
>>>>> 220: } catch (CloudRuntimeException cloudEx) {
>>>>> 221: s_logger.error("CloudRuntimeException", cloudEx);
>>>>> 222: // FIXME: Better error message? This only happens if the API command
is
>>>>> not executable, which typically
>>>>> 223: //means
>>>>> 224: // there was
>>>>> 225: // and IllegalAccessException setting one of the parameters.
>>>>> 226: throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, "Internal
>>>>> error executing API command " + cmd.getCommandName().substring(0,
>>>>> cmd.getCommandName().length() - 8));
>>>>> 227: }
>>>>>
>>>>> The "FIXME" comment seems to suggest for getter error message.
>>>>> =========================
>>>>> On Apr 2, 2014, at 4:37 PM, Daan Hoogland <daan.hoogland@gmail.com>
wrote:
>>>>>
>>>>> Ding Yuan,
>>>>>
>>>>> Any objections to that?
>>>>>
>>>>> On Wed, Apr 2, 2014 at 10:32 PM, Alena Prokharchyk
>>>>> <Alena.Prokharchyk@citrix.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 4/2/14, 1:27 PM, "Daan Hoogland" <daan.hoogland@gmail.com> wrote:
>>>>>
>>>>> I think we agree indeed. Doesn't mean we should start this discuss
>>>>> thread or write a arch guideline on the wiki somewhere. Maybe Ding
>>>>> Yuan wants to do a preliminary version of it?
>>>>>
>>>>>
>>>>> Wiki guide would be useful indeed.
>>>>>
>>>>>
>>>>>
>>>>> In the meantime I don't think that it hurts for the present patch to
>>>>> do everything in debug and decide about higher levels needed later.
>>>>>
>>>>>
>>>>> Agree.
>>>>>
>>>>>
>>>>>
>>>>> regards,
>>>>>
>>>>> On Wed, Apr 2, 2014 at 10:11 PM, Alena Prokharchyk
>>>>> <Alena.Prokharchyk@citrix.com> wrote:
>>>>>
>>>>> Daan,
>>>>>
>>>>> Correct me if I¹m wrong, but all of the logging added by Ding, fall
>>>>> under
>>>>> "to go with it or to indicate passing a certain code path². I¹ve just
>>>>> noticed that some of them were added with DEBUG, and some with WARN
>>>>> level,
>>>>> and wanted to correct that.
>>>>>
>>>>> So we should:
>>>>>
>>>>> 1) For sure: never print them out in WARN as there is nothing admin
>>>>> should
>>>>> do in this case, because the code just handles them by ignoring.
>>>>> 2) Figure out what would be the correct level to log them with: INFO
or
>>>>> DEBUG
>>>>>
>>>>> From ³Logging best practices² articles, I can see that people use INFO
>>>>> as
>>>>> a ³storyline² of normal application behavior, and DEBUG for sort of
>>>>> information that helps you to track down the failure cases scenarios.
>>>>> To
>>>>> me, stuff added by Ding, falls under second category. But I might be
>>>>> wrong
>>>>> as I don¹t recall on the spot any discussions happening on the debug
>>>>> topic, from the mailing list.
>>>>>
>>>>> -Alena.
>>>>>
>>>>>
>>>>> On 4/2/14, 12:57 PM, "Daan Hoogland" <daan.hoogland@gmail.com>
wrote:
>>>>>
>>>>> Alena,
>>>>>
>>>>> What I read in your comment is a description of INFO vs WARN. Debug
>>>>> would be only for outputting stacktraces to go with it or to indicate
>>>>> passing a certain code path.
>>>>>
>>>>> Agree?
>>>>>
>>>>> On Wed, Apr 2, 2014 at 8:31 PM, Alena Prokharchyk
>>>>> <alena.prokharchyk@citrix.com> wrote:
>>>>>
>>>>>
>>>>> -----------------------------------------------------------
>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>> https://reviews.apache.org/r/19917/#review39324
>>>>> -----------------------------------------------------------
>>>>>
>>>>>
>>>>> Is there a reason why logs for some exceptions are being logged in
>>>>> DEBUG mode, and some in WARN? From my point of view, if the code only
>>>>> catches it and doesn't error out, it should be logged in DEBUG. Lots
of
>>>>> Admins are seeking for WARN statements in the log, and they might be
>>>>> confused seeing WARN w/o further failure or retry.
>>>>>
>>>>> - Alena Prokharchyk
>>>>>
>>>>>
>>>>> On April 2, 2014, 1:55 p.m., Ding Yuan wrote:
>>>>>
>>>>>
>>>>> -----------------------------------------------------------
>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>> https://reviews.apache.org/r/19917/
>>>>> -----------------------------------------------------------
>>>>>
>>>>> (Updated April 2, 2014, 1:55 p.m.)
>>>>>
>>>>>
>>>>> Review request for cloudstack.
>>>>>
>>>>>
>>>>> Repository: cloudstack-git
>>>>>
>>>>>
>>>>> Description
>>>>> -------
>>>>>
>>>>> This is the patch for JIRA-6242. See
>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6242 for more
>>>>> details.
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Diffs
>>>>> -----
>>>>>
>>>>>
>>>>> engine/orchestration/src/com/cloud/agent/manager/AgentManagerImpl.java
>>>>> 0d41bc1
>>>>>
>>>>> engine/orchestration/src/com/cloud/agent/manager/ClusteredAgentManager
>>>>> Im
>>>>> pl.java 01508a4
>>>>>
>>>>> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>> 3e088db
>>>>>
>>>>> engine/orchestration/src/org/apache/cloudstack/engine/datacenter/entit
>>>>> y/
>>>>> api/db/dao/EngineDataCenterDaoImpl.java 4b6818e
>>>>> engine/schema/src/com/cloud/dc/dao/DataCenterDaoImpl.java ea5039f
>>>>> engine/schema/src/com/cloud/host/dao/HostDaoImpl.java 426c90d
>>>>> engine/schema/src/com/cloud/storage/dao/StoragePoolHostDaoImpl.java
>>>>> e42eaf4
>>>>> engine/schema/src/com/cloud/storage/dao/VMTemplateDaoImpl.java
>>>>> 34fdca5
>>>>> engine/schema/src/com/cloud/upgrade/dao/Upgrade2214to30.java
>>>>> 58dd916
>>>>> engine/schema/src/com/cloud/vm/dao/ConsoleProxyDaoImpl.java 5e9c2f0
>>>>> engine/schema/src/com/cloud/vm/dao/SecondaryStorageVmDaoImpl.java
>>>>> 1f382d6
>>>>>
>>>>> engine/storage/src/org/apache/cloudstack/storage/datastore/DataObjectM
>>>>> an
>>>>> agerImpl.java 6ed1274
>>>>>
>>>>> framework/ipc/src/org/apache/cloudstack/framework/serializer/OnwireCla
>>>>> ss
>>>>> Registry.java 83c8a42
>>>>>
>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/discoverer/XcpSer
>>>>> ve
>>>>> rDiscoverer.java 0ad6dc4
>>>>>
>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>>> rC
>>>>> onnectionPool.java b779085
>>>>>
>>>>> plugins/hypervisors/xen/src/com/cloud/hypervisor/xen/resource/XenServe
>>>>> rS
>>>>> torageProcessor.java e512046
>>>>>
>>>>> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/dat
>>>>> as
>>>>> tore/lifecycle/SolidFirePrimaryDataStoreLifeCycle.java af6a77a
>>>>> server/src/com/cloud/resource/ResourceManagerImpl.java f9a59ba
>>>>> server/src/com/cloud/server/ConfigurationServerImpl.java b8da4c8
>>>>>
>>>>> services/console-proxy/server/src/com/cloud/consoleproxy/ConsoleProxyT
>>>>> hu
>>>>> mbnailHandler.java 06f21d3
>>>>> utils/src/com/cloud/utils/net/NetUtils.java 6350986
>>>>>
>>>>> Diff: https://reviews.apache.org/r/19917/diff/
>>>>>
>>>>>
>>>>> Testing
>>>>> -------
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ding Yuan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Daan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Daan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Daan
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Daan
>>>>
>>>
>>
>>
>>
>> --
>> Daan
>>
>



-- 
Daan

Mime
View raw message