atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "venkata madugundu (JIRA)" <>
Subject [jira] [Commented] (ATLAS-511) Ability to run multiple instances of Atlas Server with automatic failover to one active server
Date Tue, 08 Mar 2016 09:46:40 GMT


venkata madugundu commented on ATLAS-511:

Hemanth Yamijala, thanks for uploading your thought process for HA. Had few comments...
1. Predominantly the type-definitions for entities are seen as not changing, like application
database schemas. Should Atlas consider a feature toggle (customized by users) not to refresh/reload
types when a passive instance becomes active ?

I know when the consumer application upgrades its functionality, there will be type level
changes, in that case may be the application can send a special purpose Atlas request to refresh
type cache coordinated by Zookeeper (on all Atlas instances)

2. How crticial is type cache in the context of purely SCRUD API ? In the sense, what will
be the performance hit if the types are not cached, but requested from backend store each
time they are needed. I think metadata repositories tend to be more read intensive in terms
of usage. In that light performance of 'Search' is very important. In the light of Atlas DSL
query language, the query validation (and even translation) would need to consult with types
(and their super type hierarchy). Does it make sense to evaluate query performance with and
without type cache. How quickly can Atlas query a given set of types (the ones query needs)
from backend store. If that is quick enough (quick for few class of applications), then may
be Atlas should provide a way to turn of type cache.

As the HA (and mutiple active instances) is important for our Atlas adoption, can I be of
any help in addressing specific child tasks. Please let me know, as you are in a best possible
situation to decide which ones can be delegated to other contributors. I have been using Atlas
API (the REST API) for quite sometime now like around 2/3 months. We consume the Atlas REST
API using standard Http client rather than using AtlasClient API. I have fair enough understanding
of DSL (having written a query rewriter for our multi-tenancy evaluation).

> Ability to run multiple instances of Atlas Server with automatic failover to one active
> ----------------------------------------------------------------------------------------------
>                 Key: ATLAS-511
>                 URL:
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>         Attachments: HADesign.pdf
> One of the most important components that only supports active-standby mode currently
is the Atlas server which hosts the API / UI for Atlas. As described in the [HA Documentation|],
we currently are limited to running only one instance of the Atlas server behind a proxy service.
If the running instance goes down, a manual process is required to bring up another instance.
> In this JIRA, we propose to have an ability to run multiple Atlas server instances. However,
as a first step, only one of them will be actively processing requests. To have a consistent
terminology, let us call that server the *master*. Any requests sent to the other servers
will be redirected to the master.
> When the master suffers a partition, one of the other servers must automatically become
the master and start processing requests. What this mode brings us over the current system
is the ability to automatically failover the Atlas server instance without any  manual intervention.
Note that this can be arguably called an [active/active setup|]
> ATLAS-488 raised to support multiple active Atlas server instances. While that would
be ideal, we have to learn more about the underlying system behavior before we can get there,
and hopefully we can take smaller steps to improve the system systematically. The method proposed
here is similar to what is adopted in many other Hadoop components including HDFS NameNode,
HBase HMaster etc.

This message was sent by Atlassian JIRA

View raw message