atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <hyamij...@hortonworks.com>
Subject Re: Atlas service scalability
Date Mon, 25 Jan 2016 17:13:56 GMT
Nigel,

From what I’ve heard working with other developers on this project. Hoping the more knowledgeable
devs on the project will weigh in if I’m getting anything incomplete / wrong:

* The type system is cached in memory. As types are modified by registering new models etc,
this cache is updated as also the changes persisted to the backend. There exists no mechanism
today to sync these changes across multiple instances.

* The backing metadata store and its synchronization. I am not considering the embedded BerkeleyDB
or ElasticSearch options here, as these are not to be used in production instances (although
they only make the problem harder). Particularly with Hbase, initial investigation on performance
of Hbase as the backend for Titan showed that there is locking that is done by Titan directly
using HBase to allow for concurrent modifications across multiple Titan clients (I.e. Multiple
Atlas servers), but this was having a performance impact. With a single active instance, this
can be mitigated by in-process locks.

Hope that helps.

Thanks
hemanth


On 1/25/16, 10:32 PM, "Nigel Jones" <jonesn@uk.ibm.com> wrote:

>On 25/01/2016 12:21, Venkata R Madugundu wrote:
>> Hi Hemanth,
>> OK. I am assuming, there is some sort of optimization (likely caching) done
>> in the Atlas code
>> and hence that statefullness of Atlas which does not allow multiple active
>> instances.
>
>Hemanth,
>  Can you elaborate at all on Atlas's statefullness? What would you see 
>as some of the biggest challenges (apart from coding time!) in 
>addressing the restriction? What areas of the code demonstrate this most 
>clearly?
>
>Thanks
>Nigel.
>
>
>
Mime
View raw message