maven-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barrie Treloar <baerr...@gmail.com>
Subject Re: How to get access to ALL the data in maven central?
Date Tue, 10 Apr 2012 04:08:53 GMT
On Tue, Apr 10, 2012 at 12:31 PM, Ron Wheeler
<rwheeler@artifact-software.com> wrote:
> You are going to be missing the key ingredient which is the application POMs
> that tell you what artifacts are actually used.
>
> You might get some interesting information about things like log4j which is
> probably used by lots of things inside Maven Central.
> You will be grossly misled about the use of things like CXF since it is
> hardly ever called by a library that would be submitted to Maven Central but
> is frequently used by project that are in private repositories.
>
> You may be able to visualize a "where used" between libraries but you will
> have a lot of nodes that are "never used" which is not true.
>
> You will have to figure out a way to separate projects that are still used
> and produced a ton of revisions 5 years ago but nothing since, from projects
> that are mature yet still active but only produce new versions every 18
> months since they are stable and work, from projects that were very active
> and then died as they became unnecessary due to newer technologies being
> introduced.
>
> You will also have trouble with projects that repackage their artifacts
> between major releases and change the GAV structure by redistributing the
> functionality.
>
> Not sure that your project is going to produce any useful information and I
> fear that it will be misleading to anyone who does not look deeper into the
> raw data.
>
> Visualization may just make it easier for incorrect conclusions to be
> developed.
>
> Ron
[del]
>> 457GB is a lot of data, but it isn't an unimaginable amount, and most of
>> that is no doubt the artifacts, not the metadata (pom files).
[del]

Assuming that you listened to Ron's reasoning, but you are going to go
ahead anyway.
457GB would be the jar sizes.
The pom's themselves wouldn't be that big.

Maven Central isn't directly web browsable any more, but you could use
the mirror at http://mirrors.ibiblio.org/pub/mirrors/maven2/
If you wanted to scrape Maven Central for just the poms then I'd
contact Sonatype who manage the central repository.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Mime
View raw message