Return-Path: X-Original-To: apmail-maven-users-archive@www.apache.org Delivered-To: apmail-maven-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B8327931E for ; Tue, 10 Apr 2012 04:09:22 +0000 (UTC) Received: (qmail 97323 invoked by uid 500); 10 Apr 2012 04:09:20 -0000 Delivered-To: apmail-maven-users-archive@maven.apache.org Received: (qmail 97076 invoked by uid 500); 10 Apr 2012 04:09:20 -0000 Mailing-List: contact users-help@maven.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Help: List-Post: List-Id: "Maven Users List" Reply-To: "Maven Users List" Delivered-To: mailing list users@maven.apache.org Received: (qmail 97057 invoked by uid 99); 10 Apr 2012 04:09:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2012 04:09:19 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of baerrach@gmail.com designates 209.85.214.43 as permitted sender) Received: from [209.85.214.43] (HELO mail-bk0-f43.google.com) (209.85.214.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2012 04:09:15 +0000 Received: by bkwj5 with SMTP id j5so4191477bkw.30 for ; Mon, 09 Apr 2012 21:08:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=CPXRVgBTtS8J3LkMdVBPVVVUHsW8fzuY3bdsZzf5m4A=; b=ljj2jrC55xWYV71YUvS2KNnowPZuGJenxMhUgoIMsacQlGrWy6YP0l6drthmk9Qzrv //9h7RKB6TC2KcGITrXIUcdBkCypSOlvBNoUwv2hzmAUFcQK9ucJ4TM4pCfJzz6pjos+ xPBRpQ7t/qYiG0SMOlAFx/Yye8I7a/jpDjflFzP/+FVd7+OIC9dT1A/fL1gtX6x/xTUe 0OP/WPSVctpCxgEPtN3liYS2uWSTKcuvnBaKhf1Zvol02j3z16HNT7kjOkaaNdvaMekj K6YxtKTGWPYCzscV8dIEN7Ghrgk1+LZB5KZ8BHDC1Gsme6Cnu5eynE7e7SU+NX5Jqu8+ kW8A== MIME-Version: 1.0 Received: by 10.204.9.194 with SMTP id m2mr3984986bkm.92.1334030933970; Mon, 09 Apr 2012 21:08:53 -0700 (PDT) Received: by 10.205.113.72 with HTTP; Mon, 9 Apr 2012 21:08:53 -0700 (PDT) In-Reply-To: <4F83A2A2.6070307@artifact-software.com> References: <4F83A2A2.6070307@artifact-software.com> Date: Tue, 10 Apr 2012 13:38:53 +0930 Message-ID: Subject: Re: How to get access to ALL the data in maven central? From: Barrie Treloar To: Maven Users List , rwheeler@artifact-software.com Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Apr 10, 2012 at 12:31 PM, Ron Wheeler wrote: > You are going to be missing the key ingredient which is the application POMs > that tell you what artifacts are actually used. > > You might get some interesting information about things like log4j which is > probably used by lots of things inside Maven Central. > You will be grossly misled about the use of things like CXF since it is > hardly ever called by a library that would be submitted to Maven Central but > is frequently used by project that are in private repositories. > > You may be able to visualize a "where used" between libraries but you will > have a lot of nodes that are "never used" which is not true. > > You will have to figure out a way to separate projects that are still used > and produced a ton of revisions 5 years ago but nothing since, from projects > that are mature yet still active but only produce new versions every 18 > months since they are stable and work, from projects that were very active > and then died as they became unnecessary due to newer technologies being > introduced. > > You will also have trouble with projects that repackage their artifacts > between major releases and change the GAV structure by redistributing the > functionality. > > Not sure that your project is going to produce any useful information and I > fear that it will be misleading to anyone who does not look deeper into the > raw data. > > Visualization may just make it easier for incorrect conclusions to be > developed. > > Ron [del] >> 457GB is a lot of data, but it isn't an unimaginable amount, and most of >> that is no doubt the artifacts, not the metadata (pom files). [del] Assuming that you listened to Ron's reasoning, but you are going to go ahead anyway. 457GB would be the jar sizes. The pom's themselves wouldn't be that big. Maven Central isn't directly web browsable any more, but you could use the mirror at http://mirrors.ibiblio.org/pub/mirrors/maven2/ If you wanted to scrape Maven Central for just the poms then I'd contact Sonatype who manage the central repository. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@maven.apache.org For additional commands, e-mail: users-help@maven.apache.org