directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <aok...@bellsouth.net>
Subject Re: [ApacheDS] [Schema] New schema subsystem specification
Date Fri, 24 Nov 2006 16:48:25 GMT
Norval Hope wrote:
> Sorry this thread is getting so long (I seem to have that effect)...

Anything important and this complex takes effort.

BTW also let me in the immediately mention that several of my decisions 
for this redesign were based on replication requirements.

So there may be some disparity in what we are trying to accomplish.

> On 11/24/06, Alex Karasulu <aok123@bellsouth.net> wrote:
>> Norval Hope wrote:
>> ...
>> >  1. I'd be much happier if the ".schema file => schema partition"
>> > tool were instead (or also) available as an optional start-up
>> > mechanism activatable by uncommenting support in server.xml. In the
>> > use-cases dear to my heart users are able to easily register dynamic
>> > custom partitions along with the .schema files they depend on by
>> > simply placing files in various filesystem directories (ala
>> > appservers) rather then having to run separate utilities.
>>
>> The utility can also generate an LDIF from .schema files (to add schema
>> changes) that can be applied once on startup which effectively gives you
>> what you want right?
>>
>> Given this
>> > point I'd most probably do away with a maven plugin for the ".schema
>> > => schema partition" bit and replace it with code to determine whether
>> > the .schema partition needed to be populated with bootstrap
>> > information on its first run after deployment (from .schema files
>> > included in a release .jar). For dynamic updates/additions of .schema
>> > files the relevant filesystem directories could be polled for changes
>> > periodically (again ala appservers).
>>
>> Yeah there is a problem here with having 2 copies of the same data.
>> Which one is the authoritative copy?  We'll have the same data in a
>> .schema file on disk and in the DIT.  Where do we make changes when the
>> schema is altered via the DIT?  What do we do if the schema files are
>> changed on disk?  What if there are conflicts?  How will they be 
>> resolved?
>>
> 
> My point is that in VD like cases like mine, AD is merely a custodian
> of a schema for a custom partition and is in no sense managing it:

Ok this is a good point!  And you're right.  I agree that AD when acting 
as a virtual directory needs to simply publish authoritative schema 
information pulled from the target system.

If it does store this information (which is not good) it must be 
read-only to prevent conflicts.

>    a. It should be treated as read-only by AD, there is point in
> changing anywhere other then at the target system to which the custom
> partition communicates. The authorative source is the target system.
> AD is just acting as a pass-through.

Now is the schema expected to change if the data is changed on the 
target system?

>    b. For the same reason it doesn't make sense for AD to persist the
> schema information in this case, the custom partition may be
> explicitly removed while AD is running or its deployment bundle
> removed and AD restarted, in which case I'd want all trace of the
> schema info to disappear from AD when its associated partition
> disappeared.

Hmmm this makes sense as well.

> Even in non-VD cases, I imagine the bulk of the schemas currently
> imported into AD are best considered static in the sense the end-user
> modification of them at runtime could easily destabilise the server.

Oh yeah.  This is something we can discuss until the cows come home. 
Many LDAP servers allow you to change schema information even when 
entries exist in the server using those entities that are changed.

This is a very dangerous thing to do because it makes the content and 
the server itself unstable.  Schema changes that modify or delete 
entities are really dangerous.  Adds on the other hand are fine however 
and this is generally the way in which schema changes are made.

TOOLING IS THE KEY!!!

NOTE: BUILD THIS FUNCTIONALITY INTO LDAP STUDIO.

The only way to make sure updates, and deletes to schema entities do not 
make the directory inconsistent (without slowing down the server) is to 
use tools to analyze the effects of schema changes on the entity population.

> When a schema is governed by an RFC or a spec authored by a third
> party, it would seem to be end-user modifications of it (except
> perhaps additions) would be generally outlawed. 

That makes perfect sense however you still have the ability to change 
published schema in most LDAP servers.  IMO this is pretty bad form when 
the proper way to go would be to extend an objectClass or define a new 
attribute if an existing one does not suite your needs.

The biggest problem with those bastardizing schema is that they don't 
have an IANA assigned enterprise number and they think changing existing 
standard schema are the best way to cope.  This is bad news.

Where such schemas are
> used internally by the server, then updating them implies needing to
> update the server's code at the same time, no?

Not necessarily.  I think you're referring to the extra code elements 
like normalizers, syntaxCheckers, and comparators.

An example is best here.  If you make a change to an objectClass and an 
additional MAY attribute then there is no code change required.  In most 
cases code changes are *NOT* necessary.  Here's an example of a code 
change ..

You create a new social security (SS) attribute with it's own syntax. 
Now you need a syntaxChecker for that new SS syntax to perform 
validation.  Say you want some nice format for US SS numbers like 
666-66-6666.  Then your syntaxChecker can enforce this.

>> >  2. Being able to change schema information is a very power-user
>> > feature, but I'd imagine that a much more common usage is simply
>> > wanting to add extra read-only schema information (matching various
>> > RFCs and/or basically static schemas defined by third party vendors)
>> > after deployment. In my usecases storing the thirdparty (i.e.
>> > non-core) schema information persistently is actually a minus rather
>> > then a plus; I'd prefer my users could deploy another custom partition
>>
>> Another partition?
> 
> Here I mean a custom partition authoured by one of my clients. As per
> appservers, they deploy a bundle causing AD to exposes a new
> partition.

Ok I see.

>> > with updated schema information and restart AD without having to worry
>> > about clashes with existing information. Is it theoretically possible
>> > to indentify various schema subtrees as "read-only" so that they can't
>> > be changed and aren't persisted, but are instead transiently populated
>> > from .schema files at start-up?
>>
>> Might be able to do this but I'm very against the idea of parsing
>> .schema files on startup.  Plus there are things you cannot store in
>> .schema files that you can store in the DIT.  Like normalizers,
>> syntaxCheckers and comparators.
>>
> 
> Ok, if you're against reading .schema files (or "schema+ " files that
> contain the extra information you mention) then it sounds like I'll
> need to keep my support as a custom patch to AD instead.

Well don't give up just yet.  We need to figure something out for your 
needs.   I'm starting to think we may need a special project just for 
virtual directories where the schema subsystem is designed a bit 
differently.

Or we need to add virtualization capabilities into this new schema design.

Don't worry we'll figure something out.

> On normalizers, syntaxCheckers etc am I right in thinking that
> regardless of syntax of the text file you use you're going to use as
> your initial source, there is the problem that ultimately you need to
> bind code / behaviour to their definitions: other then name(s) and OID
> etc a normalizer is basically the code that implements the
> normalization, right? 

Yep I was thinking this is byte code in a entry for a normalizer element 
in the schema area.

If so then allowing people to add there own ones
> (not included in the AD release) is going to involve classloading
> issues etc, as well as dealing with textual descriptive file.

Yep.  We're going to need to find a nice way to deal with this.

> I apologize if I'm talking crap, just trying to understand these other
> objects a bit better.

No you're fine. Don't stress.

>> >  3. Whether modifying schema information via piece-meal updates or
>> > whole .schema file imports, we face questions re versioning / draining
>> > of pending requests referring to old version of the schema etc. Is the
>> > race condition between schema changes and other operations referring
>> > to the schema some that needs to be considered now, say by
>> > synchronizing access to the schema partition.
>>
>> Schema information under this new design is just like any kind of data
>> within the server.  The same shared/exclusive lock requirements apply
>> wrt read/write opertions.
>>
> 
> With meta information like schema isn't the problem a bit worse
> though? What I'm thinking about is this sort of case (given MINA
> worker threads are executing concurrently):
>    a. user1 submits modify of attr "a" of object o1 of objectclass c1
> (MINA thread 1)
>    b. user2 submits delete of attr "a" from schema for c1 (MINA thread 2)
> where b. implies a lock on any attempts to change attr "a" in any
> instance of c1, and a. implies a lock on changing the schema for c1
> (or at least modifying type of /deleting attr "a" anyway).

Good point.  Yes the locking can be made more complex like this. 
However presently many LDAP servers leave such changes as undefined. 
You're warned not to mess with things like this.

This is not satisfactory if you ask me.  However dealing with this topic 
could take a long time.  Nothing is defined in the protocol.  Whatever 
we decide to do to manage this situation would have to be custom designed.

> So isn't it a bit different because locks need to flow forward to /
> back from meta information?

Yes it is different.  You have to lock all changes to entries that are 
the objectClass of the OC being changed.

>> > I know my focus is out of whack with AD's primary objectives, in that
>> > I don't use it as a persistent store at all,
>>
>> NP.
>>
>> but even so I see
>> > populating at start-up rather then maven plugin + import utility
>>
>> Note that this maven plugin is not for general use.  It is used to
>> pre-build the schema partition that will be deposited on disk if the
>> schema partition has not yet been created.
>>
> 
> Sure, but it rely on much the same code as the proposed LDIF tool.

Yeah so I guess we could include it in the server but this feels messy. 
  Right now I'm thinking there has to be a better solution to this problem.

Perhaps a partition can provide an method in it's interface that exposes 
a custom schema associated with the partition which is it's own SAA. 
Basically the partition can expose access to a schema object that is a 
facade for accessing various registries.  This automatically includes 
the partition's schema information in the global registries (it's joined).

The schema can also be registered with the schema subsystem as a virtual 
schema marked as read only and injected dynamically into the ou=schema 
area.  Replication wise there are no issues with this.  Basically a 
virtual schema will not be replicated with physical schema info.

This way partition startup can handle just how this information is 
obtained (parsed etc) yet the way it is exposed is the same.  The server 
can then handle this properly.

Need to think more about this idea.

>> As for the import utility it can just generate an LDIF of that you can
>> load on startup.  You can provide schemas in LDIF format for your users.
>>   The good thing with AD is that if you load an LDIF on startup AD marks
>> that LDIF file as already having been loaded and will not load it again.
>>
>> It keeps a record of what was loaded when under the ou=system area.
>>
> 
> Understood.
> 
> My problem is that one of my design goals is to keep work required by
> my client custom partition  writers to an absolute miminum. Currently
> they deploy a bundle which can optionally include a .schema file and
> that's it. I need to maintain that simplicity, so whether its in the
> core AD code (looking very unlikely I gather) or via a custom patch to
> AD that I maintain, I have to hide any steps required to encorporate a
> new schema into the server.
> 
> Also the fact the the LDIF is information is persisted and guarded
> from reloading is actually a minus in my case, because:
>    a) I want to reload to schema information each time, because it is
> maintained by author of a custom partition bundle who may have updated
> it in line with an updated version of their bundle code
>    b) If the schema information for a custom partition is persisted
> then I have a problem getting rid of it when AD starts up next time
> and this custom partition is no longer deployed.
> 
> I planned to deal with the extra info (normalizers etc you mention
> above) by looking for code associated with .schema files that defined
> the required extra java classes. The need for custom additions to the
> existing schema files in this space seems very much a boundary case to
> me anyway, these are the stats on such extensions that exist today:
> 
> Apache.schema:
>    comparators: 3, matching rule: 3, normalizer: 3, syntax checker:
> 0, syntax producer: 0
> NIS.schema:
>    comparators: 1, matching rule: 1, normalizer: 1, syntax checker:
> 2, syntax producer: 2
> Inetorgperson:
>    comparators: 4, matching rule: 4, normalizer: 4, syntax checker:
> 0, syntax producer: 0
> System:
>    comparators: 27, matching rule: 28, normalizer: 27, syntax
> checker: 59, syntax producer: 59
> 
> where a fair number of the implementations of these various extensions
> look like stubs. As I raised earlier in this diatribe, isn't it very
> likely that any such extensions required for a thirdparty schema will
> require their own custom code?

Again not necessarily.

> 
>> as a
>> > universal plus in terms of flexibility / amount of code required.
>>
>> I think some points I did not make clear.  The schema partition is a
>> single partition that will always be present just like the system
>> partition.  You will not be loading schema info into just any partition.
>>   This partition is dedicated and fixed at ou=system.  Regardless of the
>> VD you're building you'll still need to have this schema partition or
>> ApacheDS or your derived virtual directory will not start.
>>
>> What are some of your requirements for the VD you're working on?
>>
>> Alex
>>
>>
>>
> 
> To try and put in in a nutshell, the requirements on my solution are as 
> follows:
>    1. It must be possible to dynamically register read-only (from
> AD's viewpoint) schema information associated with dynamically
> registered custom partitions, to facilitate AD acting as a
> pass-through container hosting custom partitions acting as adapters
> from LDAP to various target systems (where they can be LDAP themselves
> (using a different schema), or other technologies)
>    2. Such schema information needs to be loaded and readily
> upgradable using a simple and commonly used standard representation
> (i.e OpenLDAP .schema files), which in rare cases may need to be
> augmented with extra code defining and implementing normalizers /
> matching rules etc as dictated by the schema in question.
>    3. When the last dynamic custom partition requiring a collection
> of schema information is deregistered, this schema information should
> no longer exposed by AD. Additionally AD should start up with only its
> standard schemas loaded, and schemas required for dynamic custom
> partitions added lazily as these partitions are accessed and the
> schema information becomes necessary.
>    4. In short having AD persist the schema information for these
> possibly transient dynamic custom partitions in a hiderance rather
> then a help.
> 
> At any rate, it seems like my requirements are completely disjoint
> from what you want to achieve in the schema subsystem redesign.
> 
> I already have a solution meeting my requirements by removing the need
> for the existing Maven schema plugin, and instead allowing schema
> content to be imported to an in-memory representation at start-up.
> This solution is only a stepping stone to a more dynamic one, which
> required doing away with the BootstrapRegistries stuff amongst other
> things.
> 
> I can help implement your plan and then rejig my current scheme on top
> of the new code, but I can't pretend that I'm not a little
> disappointed that there isn't a solution addressing both the core
> directory's and my "pass-through" type requirements at the same time.

Ok we can think more about how to make AD bend over backwards to do this 
right or we can just create another subproject to deal with 
virtualization, synchronization and other things.

Dave asked about this.  Now you have VD needs.  I'm seeing a trend here.

WDYT?

Alex


Mime
View raw message