directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Norval Hope" <nrh...@gmail.com>
Subject Re: [ApacheDS] [Schema] New schema subsystem specification
Date Fri, 24 Nov 2006 03:22:32 GMT
Sorry this thread is getting so long (I seem to have that effect)...

On 11/24/06, Alex Karasulu <aok123@bellsouth.net> wrote:
> Norval Hope wrote:
> ...
> >  1. I'd be much happier if the ".schema file => schema partition"
> > tool were instead (or also) available as an optional start-up
> > mechanism activatable by uncommenting support in server.xml. In the
> > use-cases dear to my heart users are able to easily register dynamic
> > custom partitions along with the .schema files they depend on by
> > simply placing files in various filesystem directories (ala
> > appservers) rather then having to run separate utilities.
>
> The utility can also generate an LDIF from .schema files (to add schema
> changes) that can be applied once on startup which effectively gives you
> what you want right?
>
> Given this
> > point I'd most probably do away with a maven plugin for the ".schema
> > => schema partition" bit and replace it with code to determine whether
> > the .schema partition needed to be populated with bootstrap
> > information on its first run after deployment (from .schema files
> > included in a release .jar). For dynamic updates/additions of .schema
> > files the relevant filesystem directories could be polled for changes
> > periodically (again ala appservers).
>
> Yeah there is a problem here with having 2 copies of the same data.
> Which one is the authoritative copy?  We'll have the same data in a
> .schema file on disk and in the DIT.  Where do we make changes when the
> schema is altered via the DIT?  What do we do if the schema files are
> changed on disk?  What if there are conflicts?  How will they be resolved?
>

My point is that in VD like cases like mine, AD is merely a custodian
of a schema for a custom partition and is in no sense managing it:
    a. It should be treated as read-only by AD, there is point in
changing anywhere other then at the target system to which the custom
partition communicates. The authorative source is the target system.
AD is just acting as a pass-through.
    b. For the same reason it doesn't make sense for AD to persist the
schema information in this case, the custom partition may be
explicitly removed while AD is running or its deployment bundle
removed and AD restarted, in which case I'd want all trace of the
schema info to disappear from AD when its associated partition
disappeared.

Even in non-VD cases, I imagine the bulk of the schemas currently
imported into AD are best considered static in the sense the end-user
modification of them at runtime could easily destabilise the server.
When a schema is governed by an RFC or a spec authored by a third
party, it would seem to be end-user modifications of it (except
perhaps additions) would be generally outlawed. Where such schemas are
used internally by the server, then updating them implies needing to
update the server's code at the same time, no?

> >  2. Being able to change schema information is a very power-user
> > feature, but I'd imagine that a much more common usage is simply
> > wanting to add extra read-only schema information (matching various
> > RFCs and/or basically static schemas defined by third party vendors)
> > after deployment. In my usecases storing the thirdparty (i.e.
> > non-core) schema information persistently is actually a minus rather
> > then a plus; I'd prefer my users could deploy another custom partition
>
> Another partition?

Here I mean a custom partition authoured by one of my clients. As per
appservers, they deploy a bundle causing AD to exposes a new
partition.

>
> > with updated schema information and restart AD without having to worry
> > about clashes with existing information. Is it theoretically possible
> > to indentify various schema subtrees as "read-only" so that they can't
> > be changed and aren't persisted, but are instead transiently populated
> > from .schema files at start-up?
>
> Might be able to do this but I'm very against the idea of parsing
> .schema files on startup.  Plus there are things you cannot store in
> .schema files that you can store in the DIT.  Like normalizers,
> syntaxCheckers and comparators.
>

Ok, if you're against reading .schema files (or "schema+ " files that
contain the extra information you mention) then it sounds like I'll
need to keep my support as a custom patch to AD instead.

On normalizers, syntaxCheckers etc am I right in thinking that
regardless of syntax of the text file you use you're going to use as
your initial source, there is the problem that ultimately you need to
bind code / behaviour to their definitions: other then name(s) and OID
etc a normalizer is basically the code that implements the
normalization, right? If so then allowing people to add there own ones
(not included in the AD release) is going to involve classloading
issues etc, as well as dealing with textual descriptive file.

I apologize if I'm talking crap, just trying to understand these other
objects a bit better.

> >  3. Whether modifying schema information via piece-meal updates or
> > whole .schema file imports, we face questions re versioning / draining
> > of pending requests referring to old version of the schema etc. Is the
> > race condition between schema changes and other operations referring
> > to the schema some that needs to be considered now, say by
> > synchronizing access to the schema partition.
>
> Schema information under this new design is just like any kind of data
> within the server.  The same shared/exclusive lock requirements apply
> wrt read/write opertions.
>

With meta information like schema isn't the problem a bit worse
though? What I'm thinking about is this sort of case (given MINA
worker threads are executing concurrently):
    a. user1 submits modify of attr "a" of object o1 of objectclass c1
(MINA thread 1)
    b. user2 submits delete of attr "a" from schema for c1 (MINA thread 2)
where b. implies a lock on any attempts to change attr "a" in any
instance of c1, and a. implies a lock on changing the schema for c1
(or at least modifying type of /deleting attr "a" anyway).

So isn't it a bit different because locks need to flow forward to /
back from meta information?

> > I know my focus is out of whack with AD's primary objectives, in that
> > I don't use it as a persistent store at all,
>
> NP.
>
> but even so I see
> > populating at start-up rather then maven plugin + import utility
>
> Note that this maven plugin is not for general use.  It is used to
> pre-build the schema partition that will be deposited on disk if the
> schema partition has not yet been created.
>

Sure, but it rely on much the same code as the proposed LDIF tool.

> As for the import utility it can just generate an LDIF of that you can
> load on startup.  You can provide schemas in LDIF format for your users.
>   The good thing with AD is that if you load an LDIF on startup AD marks
> that LDIF file as already having been loaded and will not load it again.
>
> It keeps a record of what was loaded when under the ou=system area.
>

Understood.

My problem is that one of my design goals is to keep work required by
my client custom partition  writers to an absolute miminum. Currently
they deploy a bundle which can optionally include a .schema file and
that's it. I need to maintain that simplicity, so whether its in the
core AD code (looking very unlikely I gather) or via a custom patch to
AD that I maintain, I have to hide any steps required to encorporate a
new schema into the server.

Also the fact the the LDIF is information is persisted and guarded
from reloading is actually a minus in my case, because:
    a) I want to reload to schema information each time, because it is
maintained by author of a custom partition bundle who may have updated
it in line with an updated version of their bundle code
    b) If the schema information for a custom partition is persisted
then I have a problem getting rid of it when AD starts up next time
and this custom partition is no longer deployed.

I planned to deal with the extra info (normalizers etc you mention
above) by looking for code associated with .schema files that defined
the required extra java classes. The need for custom additions to the
existing schema files in this space seems very much a boundary case to
me anyway, these are the stats on such extensions that exist today:

Apache.schema:
    comparators: 3, matching rule: 3, normalizer: 3, syntax checker:
0, syntax producer: 0
NIS.schema:
    comparators: 1, matching rule: 1, normalizer: 1, syntax checker:
2, syntax producer: 2
Inetorgperson:
    comparators: 4, matching rule: 4, normalizer: 4, syntax checker:
0, syntax producer: 0
System:
    comparators: 27, matching rule: 28, normalizer: 27, syntax
checker: 59, syntax producer: 59

where a fair number of the implementations of these various extensions
look like stubs. As I raised earlier in this diatribe, isn't it very
likely that any such extensions required for a thirdparty schema will
require their own custom code?

> as a
> > universal plus in terms of flexibility / amount of code required.
>
> I think some points I did not make clear.  The schema partition is a
> single partition that will always be present just like the system
> partition.  You will not be loading schema info into just any partition.
>   This partition is dedicated and fixed at ou=system.  Regardless of the
> VD you're building you'll still need to have this schema partition or
> ApacheDS or your derived virtual directory will not start.
>
> What are some of your requirements for the VD you're working on?
>
> Alex
>
>
>

To try and put in in a nutshell, the requirements on my solution are as follows:
    1. It must be possible to dynamically register read-only (from
AD's viewpoint) schema information associated with dynamically
registered custom partitions, to facilitate AD acting as a
pass-through container hosting custom partitions acting as adapters
from LDAP to various target systems (where they can be LDAP themselves
(using a different schema), or other technologies)
    2. Such schema information needs to be loaded and readily
upgradable using a simple and commonly used standard representation
(i.e OpenLDAP .schema files), which in rare cases may need to be
augmented with extra code defining and implementing normalizers /
matching rules etc as dictated by the schema in question.
    3. When the last dynamic custom partition requiring a collection
of schema information is deregistered, this schema information should
no longer exposed by AD. Additionally AD should start up with only its
standard schemas loaded, and schemas required for dynamic custom
partitions added lazily as these partitions are accessed and the
schema information becomes necessary.
    4. In short having AD persist the schema information for these
possibly transient dynamic custom partitions in a hiderance rather
then a help.

At any rate, it seems like my requirements are completely disjoint
from what you want to achieve in the schema subsystem redesign.

I already have a solution meeting my requirements by removing the need
for the existing Maven schema plugin, and instead allowing schema
content to be imported to an in-memory representation at start-up.
This solution is only a stepping stone to a more dynamic one, which
required doing away with the BootstrapRegistries stuff amongst other
things.

I can help implement your plan and then rejig my current scheme on top
of the new code, but I can't pretend that I'm not a little
disappointed that there isn't a solution addressing both the core
directory's and my "pass-through" type requirements at the same time.

Mime
View raw message