ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xavier Hanin <>
Subject intelligent repository cleaning (IVY-658)
Date Fri, 07 Dec 2007 19:31:57 GMT

I've started thinking at how to implement IVY-658, and it's not that easy to get information
necessary to clean a repository intelligently.

To do so, we need to have information about the dependers of a module. If Ivy shines at finding
dependees, it provides no way to access dependers, so this is something requiring some work.

To start with, I've developed a small prototype of a RepositoryManagementEngine, which actually
loads all repository metadata in memory. First I didn't even consider that idea, because it
doesn't scale. But since it was the easiest thing to start with, I gave it a try, and I now
have something able to load a whole repository in memory, and able to return information such
as the list of all modules with no dependers (the information needed for IVY-658). It could
also be improved to easily to get other informations quickly, once everything is in memory
it's pretty easy.

I'm currently doing my test on a linux box with sun jvm 1.4.2, accessing a NetApp filesystem
repository (with very good performance). On this box I have this kind of results to load a
repository of 1200 modules, 3000 module revisions: 40s, 60MB (memory usage is approximative,
I've used an utility class based on [1]). If I extrapolate these results, here's what I get:
revs    time    memory
3k      40"     60MB
25k     6'      500MB
100k    22'     2GB

I'm pretty happy with the time results (the environment is well suited for that, but since
it's a repository maintenance task, I guess most people could run it very close to their repository
data, during night or over a week-end).

As expected memory usage can more quickly become an issue. So I've done some investigation
on memory usage, and it appears that the ModuleRevisionId have a significant impact on memory
usage. Indeed these objects are used not only to identify the module revisions loaded, but
also in each dependency descriptor to store the content of the requested module revision.

I've found that in my use case Ivy was creating around 50k instances of ModuleRevisionId.
These objects being immutable, I've tried to use a strategy similar to String#intern() to
reuse the same instance whenever possible. I'be then decreased the number to 6k instances,
with a total memory used by the in memory repository information of 43MB (around 28% better).

Then I thought another area of improvement may be the dependency descriptors themselves (around
46k instances in my test case). In DefaultDependencyDescriptor, we create the instances of
LinkedHashMap used to store information when we create the object. For the exclude rules,
include rules and dependency artifacts, very frequently they are not used at all (never in
my test case). So I've change DefaultDependencyDescriptor to init these attributes only when
needed, and ended up with a 31MB footprint for the whole repository. So my new extrapolation
is now:
revs    time    memory
3k      40"     31MB
25k     6'      260MB
100k    22'     1.1GB

So I plan to commit these changes to Ivy trunk. The changes on DefaultDependencyDescriptor
just makes the code slightly less readable, so I don't think it's an issue. For ModuleRevisionId,
it introduces a very simple cache of instances based on a WeakHashMap. It means we have a
get in a Map whenever we create a new ModuleRevisionId. I don't think it will impact the performance
much, and may even decrease memory footprint for regular Ivy usage.

If you see any problem with that, feel free to let me know and we'll see how to address that

BTW, the repository cleaning task is not done yet, just repository loading and basic analysis.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message