directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: Streaming / Serializing Big Objects
Date Fri, 08 Sep 2006 22:38:10 GMT
Ole Ersoy a écrit :

><snip/>
>
>Just a quick terminology clarification - when I say
>cache I mean in memory representations and when I say
>persisted I mean written to disk.
>
>By directory tree I mean all the information that ADS
>is intended to provide, regardless of precisely how it
>is persisted or managed.  So I think we are on the
>same page here.
>  
>
Sure !

>So if all the information were in a dom like tree,
>then something like EMF OCL could be used to query it.
>  
>
Yep, but this is not exactly the way infos are stored. As we need to do 
transversal retrievements (like search for every entries which name 
start with 'ACME*'), using a DMO tree to represent data would be 
particulary ineficient. This is not the same story if we were to dump 
the content of the database, but, even then, the best solution is to use 
a standard representation like LDIF or DSML (eerrrkkk).

>This may take up more of a memory footprint, or the
>queries could be slower, but what if it's just as fast
>or faster.  Then ADS would all of a sudden have a lot
>more developers working on one of its building blocks.
>  
>
Well, hmmm, what I can say from experience is that manipulating a XML 
document is really slower than any other textual representation, by an 
order of magnitude. Beware, I'm not saying that XML is bad by essence, 
but just that you should use the correct texhnology to address every 
problem. <OT> : to send data to another human, I'm pretty confident that 
ASN.1 PER encoded is quite a way to gain interest from the NSA, who can 
think that I'm sending crypted data ;). XML is then much better. ). And 
I'm not really convinced that the technology used to build a Ldap Server 
can increase the number of developpers. However, I can be totally wrong 
:), but using Ajax, AOP, Rest and Hibernate may have some advantage, 
because of the buzz around those technos, but I'm don't really see how 
they can help implementing correctly and efficiently the basic operation 
we need to have. I prefer to dedicate a lot of time on correct algorithm 
and design, because this is essential. IMHO, of course ! </OT>

> <snip/>
>
>Yeah!  Lets go with the GMail one!!!! :-)
>  
>
eh eh... We had fun last night with Alex discussing how we can use this 
distributed resource for free :)

>So I think we are thinking pretty much the same thing
>here, and that's what the StateManager would do.
>  
>
well, hmm, StateManager don't mean a lot to me :( Sorry, man...

>It could even be pluggable, so for instance different
>state managers for different peristance mechanisms.
>
>In the end we are just reading and writing data, and
>that's the job of the StateManager.
>
>Whether it reads it all at once, a little here or a
>little there, is up to it.
>
>If a telecommunications company is using ADS that want
>lightning fast queries, then they probably would love
>to see ADS restored and run from a single file that is
>inmemory for all queries.
>  
>
Ahhh... May be you are calling 'StateManager' what we call 'Partition'. 
Partition, for us, is a 'location' where some data are stored, with a 
common root. We may have different backend, and different startegy to 
store data. So, here, StateManager = Partition. Am I right ?

>But if it's a authentication service where queries can
>take there own sweet time, then maybe the IT dept
>would rather just get 1 server with a gigantic drive
>and have ADS query a persistant data source when it
>needs stuff, and nothing is cached in memory.
>  
>
I think it's up to the service who implement ADS to determine if they 
want a in-memory partition or not. Just imagine a client-oriented Ldap - 
I worked for a client who swas storing its 70 000 000 users in a Ldap 
server -, then in-memory was out of a question, and cache was just 
useless and even cost more than what we can gain. So, basically, yes, 
you are perfectly right.

>So I think we are thinking the same thing, the only
>question is what is the best solution that minimizes
>the in memory foot print, regardless of the size of
>the cache, maximizes maintenance ease and feature
>development / modularity.
>  
>
Yes, I think that when you think we are thinking the same thing is right :)

We have had this kind of discussion (best soltuion, etc) a few times. I 
have a perception of what kind of Ldap usage we can meet in the real world :
- small Ldap database: typically, small companies, or application who 
use Ldap to manage a limited number of users : up to a few thousand entries
- medium Ldap database : medium to large company who use Ldap as the IM 
node. Around a few hendred of thousand entries
- client-centric Ldap database : very large Ldap Dabatase used to store 
information about the clients (like whitePages/yellowPages). Could store 
hundred of millions entries.
- application centric Ldap database : database with a lot of relations, 
typically used by complex applications.

All those kind of Ldap usages - and not limited to this small list - 
deserve a specific customization. Realibale database, in-memory 
database, fast disk storage, huge clusters of disks, etc... All those 
kind of possibilties are to be addressed. But, well, we are hardly to 
1.0-RC4 version :) It let a *hell* of opportunities to add all of those 
stuff, starting right now :)

><snip/>
>
>Yeah - lets just call them things...that need to be as
>fast as possible to read and as fast as possible to
>write.
>
>Ofcoarse easy of development and maintenance should be
>considered vs. the speed considerations.
>  
>
That's a very valid concern. I won't buy a 20% improvment if that mean a 
bloated code... Of course, if we are talking of an order of magnitude in 
the core functions of the server, that's another story :)

> <snip/>
>
>
>Prevayler claimed to support transactions inheriently
>just by the very nature of what it does...which make
>sense...
>  
>
Therz are many products out there which claim the same thing :)... The 
question is *not* 'is Prevayler good or bad', but what are our 
constrainst, in respect with Ldap features, and which is/are the best 
tool(s) to do that. Atm, I do think that we are in the 'think tank' 
phase. We are trying to gather our needs, and then we will be able to 
select the best tool for each 'StateManager/Partition' implementation. 
Here, again, testing and experimentation is a must.

> <snip/>
>
>>If we 'mutate' the directory tree, the cache should
>>be updated 
>>    
>>
>
>We mean persistent source here right - when I say
>cache I mean in memory data...
>  
>
I'm not sure that we need persistence here. We have a cache (but not 
necessary), and this cache has to be updated to reflect the reality 
stored in the database. While the 'mutated' data have not been written 
to the storage, then we will be able to rollback. Then, the persistent 
data may be simply considered as volatil data, as soon as we keep the 
initial modification request. I don't know. We need to deep this point.

><snip/>
>  
>
>Yes I see what you are saying with respect to
>explaining the exact mutation process that is
>happening.
>
>In the end though something changes and we need to
>capture that change somehow, so that if the server
>goes down we can get back to the same operating state.
>  
>
yes.

>We could be doing jdbc transactions right when the
>server goes down, and if the transaction is not
>complete then we still can't completely recover, but
>come close...
>  
>
This is an option, if we have a RDBMS backed partition . And in this 
case, even if the server goes down, we could consider that the request 
will be automatically rollbacked by the RDBMS, so we won't have any 
problem. If we want to be sure that the modification request will be 
executed, then we must add another mechanism : we should implement a way 
to persist requests (we can have a journal of requests, where each 
committed requests are marked 'DONE', so that if the server goes done, 
we replay the request not marked 'DONE').

Again, we have many options. The first step is to list them all (or at 
least the most wanted options).

>I think our general conversation with respect to
>journaling, etc. applies here.
>  
>
It is a part of the problem/solution :)

> <snip/>
>
>Yeah - Exactly...
>
>That's why I suggested the EMF API, because it already
>has support for a lot of stuff like that.  I need to
>get some examples worked up asap.
>  
>
Each experimentation is warmly welcomed, if they can bring some solution 
to a known problem ! For unknown problems, there is no solution that 
works :)

> <snip/>
>
>>>      
>>>
>>We definitively have to implement a RDBMS backend.
>>RDBMS offer all those 
>>
>>    
>>
>
>Dang - The message got truncated...now I gotta go and
>see what else you said somewhere else...
>  
>
I repost the mail to the ML, where I should have posted it, instead of 
doing a simple reply :)

>but I think we are pretty much on the same page...
>  
>
May be a lost of semantic due to the fact that i'm not english native :) 
But basically, yep.

>I just sent that stuff out for the sake of awareness
>mostly...
>
>There needs to be a set of considered options for
>doing ADS persistance and caching / In memory storing
>of directory entries, so I just wanted to make sure I
>threw EMF out there for the long run as somethign to
>check out.
>
>I'll get some examples worked up soon.
>  
>
Great !

Emmanuel


Mime
View raw message