cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Yadav <rohit.ya...@shapeblue.com>
Subject Re: 4.8, 4.9, and master Testing Status
Date Wed, 19 Oct 2016 11:39:18 GMT
All,


At least one more test/review LGTM is required on following PRs, please help with your review/tests
as they will help us work towards cutting the RCs:


https://github.com/apache/cloudstack/pull/1692

https://github.com/apache/cloudstack/pull/1703

https://github.com/apache/cloudstack/pull/1708


As John has shared in the previous email, we still have outstanding failures and we'll be
working towards fixing all three of them. If we can get them merged before end of the week,
we can trigger tests on them again to know test status on each of the 4.8, 4.9 and master
branches that can help us determine the quality on each of the branches and we can work towards
RCs.


Regards.

________________________________
From: John Burwell <john.burwell@shapeblue.com>
Sent: 14 October 2016 12:11:04
To: dev@cloudstack.apache.org
Subject: Re: 4.8, 4.9, and master Testing Status

All,

We have made great strides stabilizing the 4.8 [1] and 4.9 [2] smoke tests.  While we are
not super green, the following remaining failures/issues are isolated to the VPC VR and secondary
storage.

        * CLOUDSTACK-9541: redundant VPC VR: issues when master and backup switch happens
on failover [3]
        * CLOUDSTACK-9540: createPrivateGateway create private network does not create proper
VLAN network on XenServer
        * CLOUDSTACK-9528: SSVM Downloads (built-in) template multiple times

Therefore, I would like to merge these two PRs so that we can begin the process of rebasing
and retesting the PRs slotted for 4.8 and 4.9 that are not affected by these issues (i.e.
PRs unrelated to secondary storage or the VR).  Our hope is that we can correct these issues
quickly, and by the time we have worked through the backlog of pending PRs, these issues will
be addressed and we can move those impacted forward.

Unfortunately, the master PR [5] has 6 failures and 4 errors on XenServer [6] that we are
currently analyzing.  We hope to have these resolved shortly in order to begin progressing
PRs targeting master.

I would like to get 1692 [1] and 1703 [2] merged in the next 24 hours.  We need to complete
the following actions in order to accomplish this goal:

        * Obtain at least one code review LGTM on PR #1692 [1]
        * Obtain at least one code review LGTM on PR #1703 [2]
        * Obtain at least one test review LGTM on PR #1703 [2]

Once these PRs, I will be updating PRs slotted for 4.8 and 4.9 to ping authors for a rebase.
 Following each rebase, we will trigger blueorangutan to retest each one.

Thank again for your patience and assistance,
-John

[1]: https://github.com/apache/cloudstack/pull/1692
[2]: https://github.com/apache/cloudstack/pull/1703
[3]: https://issues.apache.org/jira/browse/CLOUDSTACK-9541
[4]: https://issues.apache.org/jira/browse/CLOUDSTACK-9540
[5]: https://github.com/apache/cloudstack/pull/1708
[6]: https://github.com/apache/cloudstack/pull/1708#issuecomment-253698099

> On Oct 7, 2016, at 10:12 AM, Will Stevens <wstevens@cloudops.com> wrote:
>
> Great work everyone.  Don't worry about the sporadic updates, that is just
> the nature of the beast when working through stuff like this.  Well done so
> far...
>
> *Will STEVENS*
> Lead Developer
>
> *CloudOps* *| *Cloud Solutions Experts
> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6
> w cloudops.com *|* tw @CloudOps_
>
> On Fri, Oct 7, 2016 at 9:53 AM, John Burwell <john.burwell@shapeblue.com>
> wrote:
>
>> All,
>>
>> Thank you Ilya and Haijao for your words of encouragement.  In addition to
>> the efforts of Paul, Rohit, Murali, Abhi, and Bobby, Sergey Levitskiy has
>> been providing great help testing VMware.
>>
>> I apologize for my sporadic status updates.  We have made significant
>> progress in getting smoke tests to pass on KVM, XenServer, and VMware.
>> Currently, we have the following number of failures and errors:
>>
>>        * KVM: 0
>>        * VMware: 4
>>        * XenServer: 8
>>
>> The outstanding failures and errors seem to be the caused by the following
>> issues:
>>
>>        1. On VMware and XenServer, guest VMs in VPCs start but don’t
>> acquire IP addresses causing tests relying on SSH connectivity tests to
>> fail.  The issue occurs does not occur on KVM, intermittently on VMware,
>> and consistently on XenServer.  This issue affects the test_vpc_redundant,
>> test_privategw_acl, and test_vpc_vpn test suites.   We believe that this
>> issue may be caused by either the guest VMs startup/DHCP wait period
>> winning the race with the VPC VR configuration or there is a problem on the
>> VPC VR assigning IP addresses.  We are currently investigating and expect
>> to identify the root cause shortly.
>>        2. SSVM downloads str being restarted due to ping timeouts on
>> XenServer and VMware.  We are seeing the following messages such as the
>> following in the Management Server logs:
>>
>>                com.cloud.utils.exception.CloudRuntimeException: Failed
>> to send command, due to Agent:5,com.cloud.exception.OperationTimedoutException:
>> Commands
>>                9042102151853113352 to Host 5 timed out after 2400
>>
>>          Our initial investigation discovered different timezones being
>> used by the system VM templates and Management Server.  This discrepancy We
>> have modified Trillian to ensure consistent configuration of time zones
>> across a cluster, and are preparing another run for XenServer and VMware.
>> KVM is not affected by this time zone issue because KVM hosts use the same
>> CentOS template as CentOS based Management Servers -- creating time zone
>> consistency by side effect.
>>
>> Reports of each test run are available on PR #1692 [1].  We have kicked a
>> new round of tests on KVM, VMware, and XenServer with the time zone fix and
>> additional instrumentation to run down the VPC VR race condition.
>>
>> Instead of directly forward merging these changes, we plan to open a PR
>> for each forward merge.  Since we are very close to having 4.8 resolved,
>> Rohit has open PR 1703 [2] for the 4.9 forward merge and kicked off a test
>> run.  While we cannot close this PR until 1692 is complete, we are hoping
>> to get a head start on any issues in the 4.9 branch.
>>
>> Thank you again for your patience,
>> -John
>>
>> [1]: https://github.com/apache/cloudstack/pull/1692
>> [2]: https://github.com/apache/cloudstack/pull/1703
>>
>>> On Oct 5, 2016, at 4:32 AM, Haijiao <18602198181@163.com> wrote:
>>>
>>> Though I am one of the silent majority, I would thank John the dev team
>> for your continuous effort, you keep ACS alive and better !
>>>
>>>
>>> Just heard one of biggest finance company in China running 10,000+ VMs
>> on ACS 4.4 for production/dev/QAS,  you guys should be proud of that.
>>> Salute to you!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 在2016年10月05 03时09分, "ilya"<ilya.mailing.lists@gmail.com>写道:
>>>
>>> John and Team
>>>
>>> Thanks for amazing work and contributing back.
>>>
>>> Regards,
>>> ilya
>>>
>>> On 10/3/16 9:48 PM, John Burwell wrote:
>>>> All,
>>>>
>>>> A quick update on our progress to pass all smoke tests aka super
>> green.  We have reduced the failures and errors for XenServer from 93 to 9
>> and for VMware from 51 to 14.  A CentOS 6/CentOS 6 KVM run is currently
>> executing.  Based on manual tests/fixes, we are expecting to be the first
>> super green configuration.  We have also found the following additional
>> defects:
>>>>
>>>> * CLOUDSTACK-9528 [2]: SSVM Downloads (built-in) Template Multiple
>> Times
>>>> * CLOUDSTACK-9529 [3]: Marvin Tests Do Not Clean Up Properly
>>>>
>>>> 9528 is causing XenServer environments to fail to install and startup
>> cleanly.  A lack of cleanup described in 9529 is causing XenServer to
>> exhaust available resources before a test run completes.  We believe that
>> resolution of these issues will address most, if not all, of the XenServer
>> issues.
>>>>
>>>> Thanks,
>>>> -John
>>>>
>>>> [1]: https://cwiki.apache.org/confluence/pages/viewpage.
>> action?pageId=65873020
>>>> [2]: https://issues.apache.org/jira/browse/CLOUDSTACK-9528
>>>> [3]: https://issues.apache.org/jira/browse/CLOUDSTACK-9529
>>>>
>>>>>
>>>> john.burwell@shapeblue.com
>>>> www.shapeblue.com<http://www.shapeblue.com>
>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
>>>> @shapeblue
>>>>
>>>>
>>>>
>>>> On Sep 30, 2016, at 2:40 AM, John Burwell <john.burwell@shapeblue.com>
>> wrote:
>>>>>
>>>>> All,
>>>>>
>>>>> Using blueorganutan, Rohit, Murali, Boris, Paul, Abhi, and I are
>> executing the smoke tests for the 4.8, 4.9, and master branches against the
>> following environments:
>>>>>
>>>>>   * CentOS 7.2 Management Server + VMware 5.5u3 + NFS
>> Primary/Secondary Storage
>>>>>   * CentOS 7.2 Management Server + XenServer 6.5SP1 + NFS
>> Primary/Secondary Storage
>>>>>   * CentOS 7.2 Management Server + CentOS 7.2 KVM + NFS
>> Primary/Secondary Storage
>>>>>
>>>>> Thus far, we have found seven (7) test case and/or CloudStack defects
>> in the VMware run for the 4.8 branch [1].  We are currently triaging
>> fifty-one (51) new issues from the XenServer run to determine which issues
>> were environmental and defects.  This triage work should be completed today
>> (30 Sept 2016).  Finally, we are awaiting the results of the KVM run.
>>>>>
>>>>> We are using PR #1692 [2] as the master tracking PR to fix all defects
>> in the 4.8 branch.  Our goal is to get all non-skip tests to pass and then
>> merge this PR to the 4.8, 4.9, and master.  For each bug, we are creating a
>> JIRA ticket and adding a commit to the PR.  Currently, the branch for this
>> PR is in the shapeblue repo (the branch started with a much smaller fix
>> from Paul and we just kept using it).  However, if others are interested in
>> picking up defects, we will move it to ASF repo.  Once the 4.8 branch is
>> stabilized, we plan to re-execute these tests on the 4.9 and master
>> branches as we expect that the 4.9 and master branches will have additional
>> issues.
>>>>>
>>>>> Since we are in a test freeze, I propose that no further PRs are
>> merged to the 4.8, 4.9, and master branches until they are stabilized.  The
>> following PRs will be re-based, re-tested, and merged to 4.8, 4.9.1.0,
>> and/or 4.10.0.0 post-stabilization:
>>>>>
>>>>>   * 1696
>>>>>   * 1694
>>>>>   * 1684
>>>>>    * 1681
>>>>>   * 1680
>>>>>   * 1678
>>>>>   * 1677
>>>>>   * 1676
>>>>>   * 1674
>>>>>   * 1673
>>>>>   * 1642
>>>>>   * 1624
>>>>>   * 1615
>>>>>   * 1600
>>>>>   * 1545
>>>>>   * 1542
>>>>>
>>>>> I recognize that this a large backlog of contributions ready to merge,
>> and apologize for asking folks to wait.  However, given current state of
>> the release branches, merging them before we complete fixing the smoke
>> tests would create a moving target that further delay stabilization.
>>>>>
>>>>> Obviously, it is unlikely we will make the 10 October 2016 release
>> date for the 4.8.2.0, 4.9.1.0, and 4.10.0.0 releases.  At this point, it is
>> difficult to estimate the size of the schedule slip because we still have
>> issues to triage and test runs to complete.  I have created a wiki page [2]
>> to track progress on this effort.
>>>>>
>>>>> Does this approach sound reasonable?  Any suggestions to speed up this
>> process will be greatly appreciated as stabilizing and re-opening these
>> branches stable ASAP is critical for the community.
>>>>>
>>>>> Thanks,
>>>>> -John
>>>>>
>>>>> [1]: https://issues.apache.org/jira/browse/CLOUDSTACK-9518?
>> jql=project%20%3D%20CLOUDSTACK%20AND%20fixVersion%20in%20(4.8.2.0)%
>> 20AND%20labels%20in%20(4.8.2.0-smoke-test-failure)
>>>>> [2]: https://cwiki.apache.org/confluence/pages/viewpage.
>> action?pageId=65873020
>>>>>
>>>>>> On Sep 26, 2016, at 8:38 AM, Will Stevens <wstevens@cloudops.com>
>> wrote:
>>>>>>
>>>>>> Yes, I think it is important that you or Rajani sign off on anything
>> that
>>>>>> gets in while branches are frozen so you guys can stay on top of
what
>> goes
>>>>>> in.
>>>>>>
>>>>>> Thanks for all the hard work team.  :)
>>>>>>
>>>>>> *Will STEVENS*
>>>>>> Lead Developer
>>>>>>
>>>>>> *CloudOps* *| *Cloud Solutions Experts
>>>>>> 420 rue Guy *|* Montreal *|* Quebec *|* H3J 1S6
>>>>>> w cloudops.com *|* tw @CloudOps_
>>>>>>
>>>>>> On Mon, Sep 26, 2016 at 2:10 AM, John Burwell <
>> john.burwell@shapeblue.com>
>>>>>> wrote:
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> Per our release schedule [1], the 4.8, 4.9, and master branches
are
>> frozen
>>>>>>> for testing.  There are some straggling PRs that Rajani and I
are
>> working
>>>>>>> to merge.  Is it acceptable to everyone that for the next two
(2)
>> weeks,
>>>>>>> all PRs require not only 2 LGTMs, but approval by Rajani or I
to be
>> merged
>>>>>>> to these branches?  To be clear, we don’t have to perform the
merges,
>>>>>>> simply give a thumbs up.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> john.burwell@shapeblue.com
>>>>>>> www.shapeblue.com<http://www.shapeblue.com>
>>>>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
>>>>>>> @shapeblue
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> john.burwell@shapeblue.com
>>>>> www.shapeblue.com<http://www.shapeblue.com>
>>>>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
>>>>> @shapeblue
>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>> john.burwell@shapeblue.com
>> www.shapeblue.com<http://www.shapeblue.com>
>> 53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
>> @shapeblue
>>
>>
>>
>>


john.burwell@shapeblue.com
www.shapeblue.com<http://www.shapeblue.com>
53 Chandos Place, Covent Garden, London VA WC2N 4HSUK
@shapeblue




rohit.yadav@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message