community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karanjeet Singh <karan...@usc.edu>
Subject Re: DRAT is now scanning Apache SVN code base!
Date Wed, 03 Feb 2016 08:58:42 GMT
Thanks Pierre for your feedback.

Yes, the visualization corresponds to only 133 / 191 SVN projects (
http://svn.apache.org/repos/asf/). We have successfully audited close to
175 projects and hopefully by the end of this week all the remaining
projects should be covered. We will update the data once done.

Large repositories like "subversion" and "camel" having 493,420 files (size
- 9,723 MB approx) and 519,584 files (size - 1,922 MB approx) taking up to
36 hours (only) to complete which is quite a good number.

For your second question, I don't have an answer yet. Our intentions will
be to update this regularly but we have some limitation at the Wrangler end
that it doesn't allow us to run a job for more than 48 hours. Therefore,
for very large repositories like openoffice, spamassassin, myfaces, etc,
which takes more time to get audited, it will be a challenge to split the
repositories every time and scan.

Best Regards,
Karanjeet Singh
CS Graduate Student
University of Southern California
karanjes@usc.edu | +1-213-675-9583


On Wed, Feb 3, 2016 at 12:06 AM, Pierre Smits <pierre.smits@gmail.com>
wrote:

> HI Karanjeet,
>
> This is surely an impressive piece of work. But I still notice that some
> projects are missing in the overview. Is this a mere PoC not intended to be
> complete? Or something that will be made available to all and be updated
> regularly?
>
> Best regards,
>
> Pierre Smits
>
> ORRTIZ.COM <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.orrtiz.com&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=rey8QvJVsx9VER8tfbyqcWeBc3x1dze3BDFEgOry1zo&e=
> >
> OFBiz based solutions & services
>
> OFBiz Extensions Marketplace
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__oem.ofbizci.net_oci-2D2_&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=t-3eq7_jE8P3hTlTBYAQB9p_vFHuwoj6RqdbBBr8edI&e=
>
> On Wed, Feb 3, 2016 at 2:39 AM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
> > Hi Karanjeet,
> >
> > A good bunch of work has lready gone into this and it is looking really
> > friggin smart indeed.
> > Interesting to see some many pieces of software come together and result
> in
> > something very easy to interpret.
> > Good work.
> > Lewis
> >
> > On Mon, Feb 1, 2016 at 11:44 PM, <dev-digest-help@community.apache.org>
> > wrote:
> >
> > > Hello Everyone,
> > >
> > > With great pleasure, I would like to introduce DRAT (Distributed
> Release
> > > Audit Tool) which is a distributed, parallelized wrapper around Apache
> > RAT
> > > to inspect for appropriate open source licensing in software projects.
> > > DRAT was started by my advisor, Chris Mattmann, in an effort to get RAT
> > > working on a ver large code base. RAT uses Apache OODT, Apache Tika,
> and
> > > Apache Solr.
> > >
> > > We are now auditing the complete Apache SVN code base to check for
> proper
> > > licenses. Until now, we have scanned 171 / 191 repositories and
> > > illustrated the statistics for 133 of them through D3 visualization
> > > located at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__drat.dyndns.org-3A8080_dratviz&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=EiqoixInVvAF49_1n7AxSu4q_q7BYMJ53JbVnf7rWK4&e=
> > >
> > > Projects should check out the MIME analysis of the code base and click
> > > around. Please also note due to the sheer size of the Apache code bases
> > > and the fact that we scanned and included all revisions in the Apache
> SVN
> > > repo, DRAT is not running in real time. We are running DRAT on the NSF
> > > Super Computer Wrangler, which has a petabyte of flash storage and the
> > > ability to stand up Hadoop and Spark clusters. We are also working on a
> > > paper describing our results.
> > >
> > > Please send feedback to myself (Karanjeet Singh <karanjes@usc.edu>),
> > > Professor Mattmann <mattmann@usc.edu> and/or
> irds-L@mymaillists.usc.edu.
> > >
> > > Thanks & Regards,
> > > Karanjeet Singh
> > > C.S. Graduate Student
> > > University of Southern California
> > > karanjes@usc.edu | +1-213-675-9583
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message