stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imesh Gunaratne <im...@apache.org>
Subject Re: [DISCUSS] Achieving HA or 100% (99.999%) uptime for apache stratos
Date Sat, 18 Oct 2014 06:49:52 GMT
Hi Martin,

Please find my comments inline:

On Fri, Oct 17, 2014 at 11:38 PM, Martin Eppel (meppel) <meppel@cisco.com>
wrote:

>  I would like to discuss what it would take to achieve 100% uptime for
> stratos in a production environment (aiming high to reach the five nines)
> -  if it had been discussed before please point me to the email thread.
>

Unlike other software systems a quite small downtime of a PaaS might not
affect the deployed services because it will not bring the services
(running instances) down. However yes we need to provide 100% uptime.

>
>
> The goal is to identify recommended deployment scenarios and possible
> shortcomings (or readiness ) to reach  five nines.
>
>
>
> This includes the following scenarios:
>
> + maintenance cycles,
>
> + upgrades,
>
> + hardware and software failures
>
> + scalability
>
> + ... ?
>
+1 We need to address all of these

>
>
> Generally, it seems the suggested system model to reach 100% uptime (or
> the highest possible uptime) is a n way redundancy model with multiple
> active / standby assignments.
>
>
>
> I looked in the HA for 4.1,  see web link
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Providing+High+Availability+for+Stratos
> :
>
>
>
> Stratos allows for 2 deployment models, single JVM and distributed
> deployment model.
>
>
>
> Which one will be better suited to reach the stated goal of 100% uptime /
> n way redundancy model ?
>

Deployment model is about the level of capacity Stratos can provide (number
of instances that it can support) not the level of HA. In both deployment
models we should be able to provide same level of HA.

>
>
> According  the link (and please correct me if I am wrong), it seems that
> currently the components to allow n-way redundancy are:
>
>
>
> + BAM (doc is not updated yet, see
> https://docs.wso2.com/display/CLUSTER420/Clustering+BAM+2.5.0  ?
>

Yes we are still using BAM 2.4.1 I believe, please see the below link:
https://docs.wso2.com/display/CLUSTER420/Fully-Distributed%2C+High-Availability+BAM+Setup


>
>
> + core component (Manager, Autoscaler, Cloud controller) in active/passive
> mode through Linux HA
>    RDBMS used for registry needs to support n-way redundancy as well
>

Currently I'm doing a POC on this using Pacemaker/Heartbeat, will provide
details soon:
https://issues.apache.org/jira/browse/STRATOS-897


>
> + ActiveMq
>    multiple models suggested, Zookeeper,  shared DB or shared file
> systems. Which one would be recommended to achieve h-way redundancy ?
>

Yes we need to do more investigations here.

>
>
> CEP seems to allow a 2 node configuration only or is there support for
> n-way redundancy ?
>

In distributed cache mode deployment it supports many CEP instances, will
check on this further:
https://docs.wso2.com/display/CLUSTER420/Clustering+Complex+Event+Processor#ClusteringComplexEventProcessor-Distributedcachemodedeployment


>
>
> Stratos Load Balancer, lists some caveat like session affinity not
> supported in distributed environment, n-way ready ?
>
>
>
Yes still load balancer does not have features to replicate session
information in a distributed deployment.


Thanks

-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Mime
View raw message