I think the primary focus is grouping packages with the following rules:
1) Group packages that are strongly connected
2) Start with the largest group and try to merge groups into it that do not cause additional
dependencies
3) repeat for all groups
I think you need a cost/entropy function to calculate the optimum division. If you group package
A and B then the cost is zero if A and B have identical imports. Even if A and B are not cohesive
they could be placed in the same bundle because they do not drag in additional dependencies.
So the interesting question is what the cost is if B would add one additional import. Is it
worth it? Or not?
Part of that analysis is then to analyze if you could do more grouping if classes were moved
from packages to other (new?) packages. I.e. sometimes you have a package where there is only
one class that makes the package not groupable. I think there should be the concept of a "dependency
cost". If you import X by 15 packages and 254 classes it is likely that you get your moneys
worth for that dependency. However, if you find that a single class drags in dependencies
that nobody else uses it is likely that that class is expensive. It is interesting how much
automation we can do there but I expect you need people to look at the details.
One of the biggest modularity problems are usually when you get bridge classes. I.e. someone
has a library doing X but wants to make it available with for example Spring. There is usually
then a few classes bridging the library to the Spring world, which can be extremely expensive.
For example, bnd is coupled to ant but I made sure that was a separate package.
This all seems closely related to the concept of entropy and it might be interesting to take
a look at Shannon et al. You have to find a decomposition that has minimum entropy where entropy
is somehow defined in terms of imports versus contents. You want to group as much as possible
while minimizing the connections between the groups. Again, this normally means you need a
cost function and optimize that cost function.
However, start with the mechanic grouping and apply that idea to open source projects to see
how this would look like. If you could calculate the "entropy" of existing bundles that would
also be very interesting.
Kind regards,
Peter Kriens
On 8 jun 2011, at 10:35, Tiger Gui wrote:
> Hi Peter,
>
> I am working about source code dependencies analyse algorithm design
> and implement job, i will finish the whole analyse algorithm in the
> coming month. This algorithm include two sections: package and class.
>
> 1. Package section
>
> a. It can analyse package cycles in project source code
> b. Analyse all the necessary packages for each package
> c. Tell us who use it about each package
>
> 2. Class section
>
> a. This algorithm will tell us all the class cycles in project source
> code (for example A > B > C > A)
> b. Analyse all the necessary classes for each class (for example, it
> can tell us class A use class B, C and D)
> c. Tell us who use it about each class (for example, it can tell us
> class A was used by class B and C)
>
> After we get the source code analyse report, we should split the
> project into several OSGi bundles, so the problems is how should we
> split the project according to the report.
>
> In my initial option:
>
> A. classes in a cycle should be in the same bundle
> B. classes (or interfaces ) which were used much by other classes, but
> does not require any other class, can be in the same bundle. (Usually,
> these are basic interface or abstract class). These classes usually
> be API define classes.
>
> I am very clear about these two situations, but there should be many
> other situations. So, you advises ?
>
> 
> Best Regards
> 
> Tiger Gui [tigergui1990@gmail.com]
