Hi Emmanuel,

On Dec 14, 2007 10:53 AM, Emmanuel Lecharny <elecharny@gmail.com> wrote:
Very valid points, Alex. We have had the same discussion a while back
about DN parsing...
 
Yeah I think we talked about this too a while back while annotating this experimental code with ideas in the java docs.
 

My personal guess is that you are almost fully right, but there might be
cases where we may want to check some parts of the values. The H/R
aspect, for instance, directly drives the type of value we will create.
 
Yeah we're not completely free of having to do something I agree.  We just want to minimalize just how much schema checking we want to enforce.  We do what we have to do to remove some headaches but it's not our primary objective in this region of the code.
 

We can let the Schema intercerptor deal with normalization and syntax
checking, instead of asking the EntryAttribute to do the checking. That
means we _must_ put this interceptor very high in the chain.
 
Right now I think this is split into two interceptors.  The first one which is executed immediately is the Normalization interceptor.  It's really an extension of the schema subsystem.  Normalization cannot occur without schema information and the process of normalization automatically enforces value syntax.  This is because to normalize most parsers embedded in a normalizer must validate the syntax to transform the value to a cannonical representation using String prep rules.
 
The big difference that has evolved between the Normalization interceptor and the Schema interceptor is that the Normalization interceptor is not designed to fully check schema.  It does *ONLY* what it needs to do to evaluate the validity of a request against the DIT.  For example the DN is normalized and the filter expression is normalized early to determine if we can short this process with a rapid return.  This reduces latency and weeds out most incorrect requests.  Now with normalized parameters the Exception interceptor can more accurately do it's work to determine whether or not the request makes sense: i.e. does the entry that is being deleted actually exist?  Then the request goes deeper into the interceptor chain for further processing.  The key concept in terms of normalization and schema checking is lazy execution. 
 
Lazy execution makes sense most of the time but from the many converstations we've had it seems this might actually be harming us since we're doing many of the same computations over and over again while discarding the results, especially where normalization is concerned. 
 

Here are the possible checks we can have on a value for an attribute :
 

H/R : could be done when creating the attribute or adding some value into it
 
Yes this will have to happen very early within the codec I guess right?
 

Syntax checking : SchemaInterceptor
Normalization : SchemaInterceptor
 
Right now request parameters are normalized in within the Normalization interceptor and the these other aspects (items) are being handled in the Schema interceptor.
 

Single value : SchemaInterceptor

So I would say we should simply test the H/R flag in EntryAttribute.
 
Yes this sounds like something we must do to create the correct entry composition in the codec.  Otherwise we would need an intermediate representation which is a waste of memory and cycles.
 

It brings to my mind another concern :
let's think about what could happen if we change the schema : we will
have to update all the existing Attributes, which is simply not
possible. Thus, storing the AttributeType within the EntryAttribute does
not sound good anymore. (unless we kill all the current requests before
we change the schema). It would be better to store an accessor to the
schema sub-system, no ?
 
This is a big concern.  For this reason I prefer holding references to high level service objects which can swap out things like registries when the schema changes.  This is especially important within services and interceptors that depend in particular on the schema service.  I would rather spend an extra cycle to do more lookups than with lazy resolution which leads to a more dynamic architecture.  Changes to components are reflected immediately this way and have little impact in terms of leaving stale objects around which may present problems and need to be cleaned up.
 
However on the flip side there's a line we need to draw.  Where we draw this line will determine the level of isolation we want.  Let me draw out a couple of specific scenarios to clarify. 
 
Scenario 1
========
 
A client binds to the server and pulls the schema at version 1, then before issuing an add operation for a specific objectClass the schema changes and one of the objectClasses in the entry to be added is no longer present.  The request will fail and should since the schema changed.  Incidentally a smart client should check the subscemaSubentry timestamps before issing write operations to see if needs to check for schema changes that make the request invalid.
 
Scenario 2
========
 
A client binds to the server and pulls schema at version 1, then issues an add request, as the add request is being processed by the server the schema changes and one of the objectClass in the entry to be added is no longer present. 
 
Scenario 1 is pretty clear and easy to handle.  It will be handled automatically for us anyway without having to explicitly code the correct behavior.  Scenario 2 is a bit tricky.  First of all we have to determine the correct behavoir that needs to be exhibited.  Before confirming with the specifications (which we need to do) my suspicions would incline me to think that this add request should be allowed since it was issued and received before the schema change was committed.  In this case it's OK for the add request to contain handles on schema data which might be old but consistent with the time at which that request was issued.
 
So to conclude I think it's OK, prefered and efficient for request parameters and intermediate derived data structures used to evaluate requests to have and leverage schema information that is not necessarily up to date with the last schema change.  This brings up a slew of other problems we have to tackle btw but we can talk about this in another thread.
 
SNIP ...
 

> If the answer is apply all schema checks then how do we deal with
> situations where the entry is inconsistent during composition but will
> be consistent at the end?  For example you have an inetOrgPerson that
> requires sn and cn attributes.  The user adds the objectClass
> attribute with the inetOrgPerson value into the Entry.  If we have
> schema checks enabled then this user action will trigger a violation
> error.   Likewise if they add sn or cn before they add the objectClass
> attribute since these attributes will not be in the must may list yet.
That's not exactly what we want to introduce into the Entry class. This
is clearly done by the Schema interceptor system. But it was not my
initial concern, too, as I was specifically mentioning the
EntryAttribute alone, not the Entry as a whole. So we are on the same
page here.
 
I was just trying to say, if we start doing schema checks where do we stop.  However we may want to do these early too for very specific attributes like for example the objectClass attribute.
 
Another mode of thinking may suggest performing all schema checks immediately in one place since circumstances force us to deal with a part of the problem anyway.  This line of thinking favors the benefit of keeping such code associated with a specific function together in one place.
 
I don't know what is the correct answer here but was expressing to you the different ways we can approach this problem.  I know you were talking about attribute values but soon you'll find this pulls us into the conversation about schema checks at the entry level.
 

> So I think we open a Pandora's box if we try to overload too much
> functionality into this Entry/Attribute API whose primary purpose is
> with respect to managing entry composition.
yeah. We need some balance. This is the reason I asked before doing
stupid things :) At the end, this will be less work  for me ;)
 
Oh you're more right then you can imagine. This is why I'm being overly analytical myself.  Slipping up here will have repercussions all over.  There's no right answer though even though some answers will be very very detrimental.  The top few best answers will also have tradeoffs associated with them and evaluating this and comming to a conclusion on how best to proceed is what makes this such a difficult design problem.
 
Alex