stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imesh Gunaratne <im...@apache.org>
Subject Re: Clustered deployments of Stratos
Date Mon, 18 May 2015 14:23:01 GMT
Thanks Shaheed! I will verify the second problem where Stratos is not
detecting manually terminated members.

Thanks

On Mon, May 18, 2015 at 3:39 PM, Shaheedur Haque (shahhaqu) <
shahhaqu@cisco.com> wrote:

>  Ack. We are just in the middle of doing getting sync’d up again to
> master, and it sounds like that might fix the persistence issue.
>
>
>
> I guess that leaves the Cartridge Agent reconnect side of the problem…
>
>
>
> *From:* Lahiru Sandaruwan [mailto:lahirus@wso2.com]
> *Sent:* 17 May 2015 03:06
>
> *To:* dev
> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Similarly it would be a great help, if you can verify all these issues in
> latest code, since we have been fixing a lot of issues in recent days, as a
> result of RC1 testing.
>
>
>
> Thanks.
>
>
>
> On Fri, May 15, 2015 at 9:42 PM, Imesh Gunaratne <imesh@apache.org> wrote:
>
> Hi Shaheed,
>
>
>
> Thanks for the quick response, after analyzing the results you have
> provided again, it looks like only the deployment policies are missing
> after the failover. We have fixed this issue in commit
> revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3
>
>
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3C22eed4e8639c401a8fda637fa6bb4501@git.apache.org%3E
>
>
>
> Would you mind verifying whether this is there in your runtime?
>
>
>
> Thanks
>
>
>
>
>
> On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> The latter; we never have both Stratos instances running.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 15 May 2015 16:17
> *To:* dev
> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Do you have both active and passive Stratos nodes running at the same time
> or do you start the passive node once the active node goes down?
>
>
>
> Thanks
>
>
>
> On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi Imesh,
>
>
>
> I finally got round to a proper series of tests, and here are the
> conclusions:
>
>
>
> ·        In Stratos 4.0, after a Pacemaker driven failover, the newly
> Active Stratos has lost all Cartridge Definitions.
>
> ·        In current [1] Stratos 4.1, after a Pacemaker driven failover,
> the newly Active Stratos:
>
> o   Has lost all Deployment Policies.
>
> o   Has lost contact with the Cartridge Agents, and all VMs are stuck
> with whatever state they had before the failover.
>
> ·        Note: I have not verified if Cartridge Groups are lost or not.
>
>
>
> I include the test results below at [2] and [3]. I am concerned as to
> whether 4.1 is ready for GA on this basis, so though more testing is no
> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the
> list ASAP.
>
>
>
> Thanks, Shaheed
>
>
>
> [1] A recent build somewhere between beta 1 and beta 2, but I don’t think
> any relevant fixes have been made in master.
>
>
>
> [2] Persistence test output from Stratos 4.1. Note:
>
>
>
> 1.      In the build I have, the CLI is broken for a couple of commands;
> these are supplemented by direct “curl” commands further down.
>
> 2.      I’ve used one of our commands to show the instances and their
> state for a given application since there is not a compact JSON or
> convenient Startos CLI for that.
>
>
>
> *PERSISTENCE TEST, BEFORE FAILOVER*
>
> *================================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May
> 15 04:46:58 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> Deployment policies found:
>
> +-------------------+---------------+
>
> | ID                | Accessibility |
>
> +-------------------+---------------+
>
> | static-2-ha       | 1             |
>
> +-------------------+---------------+
>
> | autoscale-2-10-ha | 1             |
>
> +-------------------+---------------+
>
> | autoscale-1-5     | 1             |
>
> +-------------------+---------------+
>
> | static-1          | 1             |
>
> +-------------------+---------------+
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
>
>
> *PERSISTENCE TEST, AFTER FAILOVER*
>
> *===============================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May
> 15 05:26:52 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> No deployment policies found
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
> [3] Cartridge test output from Stratos 4.1. Note:
>
>
>
> 1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.
>
> 2.      We expect the Cartridge Agent to use a DNS lookup when it ends up
> reconnecting, and this worked just fine in Stratos 4.0.
>
>
>
> *CARTRIDGE TEST, BEFORE FAILOVER*
>
> *==============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST, AFTER FAILOVER*
>
> *=============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN
> WAIT 2 MINUTES*
>
>
> *===================================================================================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 14 May 2015 20:34
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> It would be better to use the REST API to query and see whether the
> relevant entities are persisted. Since data is stored in binary format in
> the registry it would be difficult to query the database and verify this.
>
>
>
> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
> going to need more specifics.
>
>
>
> For example, what query would you recommend to look at say deployment
> policies and cartridge definitions?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 09 May 2015 09:08
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Yes you could refer the tables that have the prefix "REG_".
>
>
>
> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Can you suggest what tables to look at?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 07 May 2015 18:00
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for the clarification! May be the problem is with the MySQL
> active-passive configuration.
>
>
>
> I understand that you are switching the same OpenStack volume from active
> node to the passive node (when the passive node becomes active) therefore
> technically it should work. May be we need to investigate this problem
> further by analysing whether data is persisted properly in the active node
> before the passive node becomes active.
>
>
>
> Thanks
>
>
>
> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> The data is not synchronised between the active and passive nodes. For
> clarity, this is the HA model we had, much as described in the blog:
>
>
>
> ·        2 nodes, with Pacemaker in active-passive mode.
>
> ·        Under Pacemaker control:
>
> o   We run MySQL in active-passive mode, using a single OpenStack volume
> which we attach/reattach as the active role moves around nodes.
>
> o   As the Pacemaker moves the volume, and thus MySQL around on node
> failures, ActiveMQ and Stratos are moved around too.
>
> o   Thus, everything operates in active-passive mode.
>
>
>
> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
> Stratos JVM on the old active node has gone with the node, and Pacemaker
> starts up a new Stratos JVM on what used to be the passive node), we found
> that the Cartridge Definition objects were found to be missing and, as a
> clumsy workaround [1], we had to replay the stored copied of them into
> Stratos using the REST API.
>
>
>
> With Stratos 4.1, using the new object names , early indications are *Deployment
> Policies* and *Application Deployment* policies are lost as the active
> fails over to the passive. If anything, these objects are more likely to
> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
> the fly (min/max etc).
>
>
>
> Thanks, Shaheed
>
>
>
> [1] Clearly, this loses any changes that were not in the stored copies.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 03 May 2015 06:43
> *To:* dev@stratos.apache.org
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for taking time to test this!
>
>
>
> Just to clarify the exact problem, do you mean that data is not
> synchronized between the active and passive nodes or they are not persisted
> in the active node?
>
>
>
> Thanks
>
>
> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <shahhaqu@cisco.com>
> wrote:
>
>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>  ------------------------------
>
> *From:* Imesh Gunaratne [imesh@apache.org]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
> Hi Shaheed,
>
>
>
> Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>
>
> However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> --
> Lahiru Sandaruwan
>
> Committer and PMC member, Apache Stratos,
> Senior Software Engineer,
> WSO2 Inc., http://wso2.com
>
> lean.enterprise.middleware
>
> phone: +94773325954
> email: lahirus@wso2.com blog: http://lahiruwrites.blogspot.com/
> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>
>
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Mime
View raw message