community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Cunningham <otto...@gmail.com>
Subject Re: DRAT is now scanning Apache SVN code base!
Date Wed, 03 Feb 2016 09:07:45 GMT
On Feb 3, 2016 4:06 AM, "Tony Stevenson" <tony@pc-tony.com> wrote:

> cc += infra@
>
> Karanjeet,
>
> I am writing to you whilst wearing my Infrastructure hat.
>
> Please be careful if you are indeed recursing the entire ASF subversion
> repository (http://svn.apache.org) - as you will quite likely run into
> the aug-banning service.
> Have you seen https://svn-dump.apache.org ?  This is an entire dump of
> the SVN repo (at least the public one you are interested in. You can use
> this, and it is updated monthly. If you really need fully upto date data
> you can use the dump, and svnsync the remaining revisions.
>
> I guess this might be obvious, but I’ll mention it just in case.  A lot of
> projects are using git repositories too. Which are mirrored here:
> github.com/apache/
>
>
>
> --
> Cheers,
> Tony
>
> On behalf of the Apache Infrastructure Team
>
> -----------------------
> http://www.pc-tony.com
> GPG - 3072D/2543E323
> -----------------------
>
> > On 3 Feb 2016, at 08:58, Karanjeet Singh <karanjes@usc.edu> wrote:
> >
> > Thanks Pierre for your feedback.
> >
> > Yes, the visualization corresponds to only 133 / 191 SVN projects (
> > http://svn.apache.org/repos/asf/). We have successfully audited close to
> > 175 projects and hopefully by the end of this week all the remaining
> > projects should be covered. We will update the data once done.
> >
> > Large repositories like "subversion" and "camel" having 493,420 files
> (size
> > - 9,723 MB approx) and 519,584 files (size - 1,922 MB approx) taking up
> to
> > 36 hours (only) to complete which is quite a good number.
> >
> > For your second question, I don't have an answer yet. Our intentions will
> > be to update this regularly but we have some limitation at the Wrangler
> end
> > that it doesn't allow us to run a job for more than 48 hours. Therefore,
> > for very large repositories like openoffice, spamassassin, myfaces, etc,
> > which takes more time to get audited, it will be a challenge to split the
> > repositories every time and scan.
> >
> > Best Regards,
> > Karanjeet Singh
> > CS Graduate Student
> > University of Southern California
> > karanjes@usc.edu | +1-213-675-9583
> >
> >
> > On Wed, Feb 3, 2016 at 12:06 AM, Pierre Smits <pierre.smits@gmail.com>
> > wrote:
> >
> >> HI Karanjeet,
> >>
> >> This is surely an impressive piece of work. But I still notice that some
> >> projects are missing in the overview. Is this a mere PoC not intended
> to be
> >> complete? Or something that will be made available to all and be updated
> >> regularly?
> >>
> >> Best regards,
> >>
> >> Pierre Smits
> >>
> >> ORRTIZ.COM <
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.orrtiz.com&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=rey8QvJVsx9VER8tfbyqcWeBc3x1dze3BDFEgOry1zo&e=
> >>>
> >> OFBiz based solutions & services
> >>
> >> OFBiz Extensions Marketplace
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__oem.ofbizci.net_oci-2D2_&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=t-3eq7_jE8P3hTlTBYAQB9p_vFHuwoj6RqdbBBr8edI&e=
> >>
> >> On Wed, Feb 3, 2016 at 2:39 AM, Lewis John Mcgibbney <
> >> lewis.mcgibbney@gmail.com> wrote:
> >>
> >>> Hi Karanjeet,
> >>>
> >>> A good bunch of work has lready gone into this and it is looking really
> >>> friggin smart indeed.
> >>> Interesting to see some many pieces of software come together and
> result
> >> in
> >>> something very easy to interpret.
> >>> Good work.
> >>> Lewis
> >>>
> >>> On Mon, Feb 1, 2016 at 11:44 PM, <dev-digest-help@community.apache.org
> >
> >>> wrote:
> >>>
> >>>> Hello Everyone,
> >>>>
> >>>> With great pleasure, I would like to introduce DRAT (Distributed
> >> Release
> >>>> Audit Tool) which is a distributed, parallelized wrapper around Apache
> >>> RAT
> >>>> to inspect for appropriate open source licensing in software projects.
> >>>> DRAT was started by my advisor, Chris Mattmann, in an effort to get
> RAT
> >>>> working on a ver large code base. RAT uses Apache OODT, Apache Tika,
> >> and
> >>>> Apache Solr.
> >>>>
> >>>> We are now auditing the complete Apache SVN code base to check for
> >> proper
> >>>> licenses. Until now, we have scanned 171 / 191 repositories and
> >>>> illustrated the statistics for 133 of them through D3 visualization
> >>>> located at
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__drat.dyndns.org-3A8080_dratviz&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=EiqoixInVvAF49_1n7AxSu4q_q7BYMJ53JbVnf7rWK4&e=
> >>>>
> >>>> Projects should check out the MIME analysis of the code base and click
> >>>> around. Please also note due to the sheer size of the Apache code
> bases
> >>>> and the fact that we scanned and included all revisions in the Apache
> >> SVN
> >>>> repo, DRAT is not running in real time. We are running DRAT on the NSF
> >>>> Super Computer Wrangler, which has a petabyte of flash storage and the
> >>>> ability to stand up Hadoop and Spark clusters. We are also working on
> a
> >>>> paper describing our results.
> >>>>
> >>>> Please send feedback to myself (Karanjeet Singh <karanjes@usc.edu>),
> >>>> Professor Mattmann <mattmann@usc.edu> and/or
> >> irds-L@mymaillists.usc.edu.
> >>>>
> >>>> Thanks & Regards,
> >>>> Karanjeet Singh
> >>>> C.S. Graduate Student
> >>>> University of Southern California
> >>>> karanjes@usc.edu | +1-213-675-9583
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message