Hi all,

Excuse the cross post but this also has significance to the API list.

Problem
------------

For our benefit and the benefit of our users we need to be uber careful with changes after a major GA release. We have another thread where it seems people agree with the Eclipse scheme of versioning and this sounds really flexible for our needs. We can do a 2.0.0-M1 release at any time without clamping down on API's. Only when we do a RC do we have to freeze changes to interfaces.

The debate still remains as to what constitues an interface. Emmanuel seems to disagree with configuration, schema, and partition db formats as being interfaces of concern but for the time being we can just discuss those we do agree on. There's no doubt about APIs and SPIs.


Solution
------------

So how do we make this as painless to us and users as much as is possible? The best way is to keep the surface area of the SPI or API small, create solid boundaries, and avoid exposing implementation details and implementation classes.

By reducing the surface area with implementation hiding we can effectively limit exposure and reduce the probability of needing to make a change that breaks with our user contract. You might be asking what's a real world example of this for us in shared? 

And incidentally this is one of the things I've been working on in my branch. 


Real World Example in Shared
--------------------------------------------

Let's take the o.a.d.s.ldap.message package as an example. This package contains classes and interfaces modeling LDAP requests and responses: i.e. AddRequest, DeleteResponse etc. It's in the shared-ldap module.

In this package, in addition to request response interfaces, we're exposing implementation classes for them. The implementation classes, in turn have dependencies on o.a.d.s.ldap.codec.* packages. This is because some implementation classes depend on codec functionality which is an implementation detail. This might be due to eager reuse or the addition of utility methods into codec classes for convenience. Some of these dependencies can be removed by breaking out non-implementation specific methods and constants in codec classes into utility methods outside of the package or the module all together. Furthermore the codec implementation that handles [de]marshaling has to access package friendly (non-API) methods on implementation classes while encoding. 

In the end, dependency upon further transitive dependencies are making us expose almost all implementation classes in shared, and most can easily be decoupled and hidden. It's effectively making everything in shared come together in one big heap exposing way more than we want to.


LDAP Client API
------------------------

Everyone agrees that this API is very important to get right with a 1.0. Right now this API pulls in several public interfaces directly from shared. Those interfaces also pull in some implementation classes. The logical API extends into shared this way. Effectively the majority of shared is exposed by the client API. The client API does not end at it's jar boundary.

All this exposure increases the chances of API change when all implementation details are wide open and part of the client API.  And this is what I'm trying to limit. There are ways we can decouple these dependencies very nicely with a mixed bag of refactoring techniques while breaking up shared-ldap into lesser more coherent modules. The idea is to expose the bare minimum of only what we need to expose. Yes the shared code has become very stable over time but the most stability is in the interfaces and if we only expose these instead of implementation classes then we'll have an awesome API that may remain 1.X for a while and not require deprecations as new functionality is introduced.


Finishing Up the Example
-------------------------------------

So what concrete things can we do?

The biggest step is to hide as many of the implementation classes as possible. In my experimental branch I started by:

    (1) Moving out methods and constants in codec classes causing unnecessary dependencies from message package classes and interfaces. There was a situation even where StringTools for example depended on codec classes, and virtually everything doing string related operations used StringTools there by causing man interdependencies. It then becomes a web of dependencies across packages.

    (2) Breaking up shared into multiple Maven modules so now there's the following modules:

          o shared-util
          o asn1-api
          o asn1-ber
          o ldap-model
                 - name pkg
                 - message pkg (no impl classes)
                 - schema pkg
                 - cursor pkg
                 - filter pkg
                 - entry pkg
                 - constants pkg
          o ldap-codec (not complete)  

The next step would be to make these artifacts into OSGi bundles. There will be nothing special about it. I'm just going to leverage bundle packaging to hide implementation classes which you cannot do as easily with regular jars with explicit package exports.

Once this is done, we can export a minimal set of classes from the codec, hide it's remainder, and have the model interfaces be the primary dependency used by the client API without exposing implementation classes and keeping the API weight (surface area) down. 

There's a lot more to do, the job is 40% complete. The wait for the AP merge makes this work feel moot since the merge is going to be nasty so I might just redo this again after Emmanuel merges. That lets me be a bit more agressive and experimental for now.

Plus if Pierre and Seelman decide to opt for using m2eclipse+Maven+Tyco (as Jesse mentioned) for the Studio build then these refactorings a second time will not incur manual fixing in Studio which depends on shared now. I can refactor Studio at the same time.


Conclusions
-----------------

So this example shows some things we can do to make things tighter and easier for us to better manage our API's. We can do anything we like to the implementation to fix bugs and to improve performance in point releases without impacting the minimal interfaces we expose for the API.

We take similar steps inside the server to restrict down the exposed SPI however using OSGi is probably not going to be an option there right away since it gets more complicated. Here in shared I would use bundle packaging just to hide implementation classes, not to define services etc.

Also there are some classes that were proposed for shared, i.e. DnNode which at this point in time are specific to the server. Sure Studio might use these classes eventually, however these classes are not generic LDAP. These classes can stay in shared but they should be kept in a module separate from the ldap-model for example. Why you may ask? Because these classes are not generic LDAP classes (like Entry, or Dn, or Cursor is generic and) are not needed by every client, nor are they viable for every server a client connects to. They only serve a purpose when used in Studio, connecting to ApacheDS.

DnNode might be needed by Studio in the future for making a plugin and widget that allows users to graphically manage the boundaries of administrative areas, however it's not something every client needs, and it certainly is not something needed by a generic client connecting to every server.

So things like this as well as the category of interfaces and classes used for modeling ApacheDS specific features which also are used by Studio should be in their own modules, if kept in shared, separate from the model or the codec bundles. This way they can remain in shared, used by both Studio and ApacheDS without polluting the client API.  As an example, the ACI mechanism we use is very ApacheDS specific and is used by Studio's ACI editor. I wanted to say X.500 specific, but we've changed our ACIs a tiny bit. So we might have an ldap-aci module that pulls these things out of the ldap-model so our standard client API remains clean and light, free of our ApacheDS specific features.

The power behind this API is the number of people and projects that will use it. We don't want the OpenDS folks for example to avoid it just because they don't want our ApacheDS specific interfaces weighing it down and contaminating it. I'd love to see the API used with a light footprint on mobile devices, so footprint will matter in this odd ball case as well.

-- 
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory Server :: http://directory.apache.org
Apache MINA :: http://mina.apache.org
To set up a meeting with me: http://tungle.me/AlexKarasulu