community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karanjeet Singh <karan...@usc.edu>
Subject Re: DRAT is now scanning Apache SVN code base!
Date Wed, 03 Feb 2016 09:17:07 GMT
Thanks Don and Tony.

Yes, we have used the http://svn-dump.apache.org/ link to download the SVN
dump and then we are running DRAT on it.

The other link was just for reference.

I hope, I am safe from the aug-banning service. :)

Best Regards,
Karanjeet Singh
C.S. Graduate Student
University of Southern California
karanjes@usc.edu | +1-213-675-9583

On Wed, Feb 3, 2016 at 1:07 AM, Don Cunningham <ottoc58@gmail.com> wrote:

> On Feb 3, 2016 4:06 AM, "Tony Stevenson" <tony@pc-tony.com> wrote:
>
>> cc += infra@
>>
>> Karanjeet,
>>
>> I am writing to you whilst wearing my Infrastructure hat.
>>
>> Please be careful if you are indeed recursing the entire ASF subversion
>> repository (http://svn.apache.org
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org&d=CwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=GwFuyVGIP6yVZZagar8dUlZNTgV_2g_CbdaYK0Bi3mM&s=d_X9L9oLXCkkHS5f1V4oihsxSwxuq7o9IWaCkw2eb9M&e=>)
>> - as you will quite likely run into the aug-banning service.
>> Have you seen https://svn-dump.apache.org
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__svn-2Ddump.apache.org&d=CwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=GwFuyVGIP6yVZZagar8dUlZNTgV_2g_CbdaYK0Bi3mM&s=PX-TjYkrYF2jtnk0eGgBJvKriwcbOcgIeENvi52T7sE&e=>
>> ?  This is an entire dump of the SVN repo (at least the public one you are
>> interested in. You can use this, and it is updated monthly. If you really
>> need fully upto date data you can use the dump, and svnsync the remaining
>> revisions.
>>
>> I guess this might be obvious, but I’ll mention it just in case.  A lot
>> of projects are using git repositories too. Which are mirrored here:
>> github.com/apache/
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_apache_&d=CwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=GwFuyVGIP6yVZZagar8dUlZNTgV_2g_CbdaYK0Bi3mM&s=a5TxI_VOrBw4vEQDR21R7aI59AIJINFcpGunOZJVAxQ&e=>
>>
>>
>>
>> --
>> Cheers,
>> Tony
>>
>> On behalf of the Apache Infrastructure Team
>>
>> -----------------------
>> http://www.pc-tony.com
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pc-2Dtony.com&d=CwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=GwFuyVGIP6yVZZagar8dUlZNTgV_2g_CbdaYK0Bi3mM&s=_j67WueILi3vYFsR4jNWB1_Aoyd4OhQxRso-rmUSmB4&e=>
>> GPG - 3072D/2543E323
>> -----------------------
>>
>> > On 3 Feb 2016, at 08:58, Karanjeet Singh <karanjes@usc.edu> wrote:
>> >
>> > Thanks Pierre for your feedback.
>> >
>> > Yes, the visualization corresponds to only 133 / 191 SVN projects (
>> > http://svn.apache.org/repos/asf/
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_repos_asf_&d=CwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=GwFuyVGIP6yVZZagar8dUlZNTgV_2g_CbdaYK0Bi3mM&s=xCX9GMMgDA5qtKvRJHDNZee5gprmc0l0d06PjjB4DE8&e=>).
>> We have successfully audited close to
>>
>> > 175 projects and hopefully by the end of this week all the remaining
>> > projects should be covered. We will update the data once done.
>> >
>> > Large repositories like "subversion" and "camel" having 493,420 files
>> (size
>> > - 9,723 MB approx) and 519,584 files (size - 1,922 MB approx) taking up
>> to
>> > 36 hours (only) to complete which is quite a good number.
>> >
>> > For your second question, I don't have an answer yet. Our intentions
>> will
>> > be to update this regularly but we have some limitation at the Wrangler
>> end
>> > that it doesn't allow us to run a job for more than 48 hours. Therefore,
>> > for very large repositories like openoffice, spamassassin, myfaces, etc,
>> > which takes more time to get audited, it will be a challenge to split
>> the
>> > repositories every time and scan.
>> >
>> > Best Regards,
>> > Karanjeet Singh
>> > CS Graduate Student
>> > University of Southern California
>> > karanjes@usc.edu | +1-213-675-9583
>> >
>> >
>> > On Wed, Feb 3, 2016 at 12:06 AM, Pierre Smits <pierre.smits@gmail.com>
>> > wrote:
>> >
>> >> HI Karanjeet,
>> >>
>> >> This is surely an impressive piece of work. But I still notice that
>> some
>> >> projects are missing in the overview. Is this a mere PoC not intended
>> to be
>> >> complete? Or something that will be made available to all and be
>> updated
>> >> regularly?
>> >>
>> >> Best regards,
>> >>
>> >> Pierre Smits
>> >>
>> >> ORRTIZ.COM
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__ORRTIZ.COM&d=CwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=GwFuyVGIP6yVZZagar8dUlZNTgV_2g_CbdaYK0Bi3mM&s=huYGKDzK8FadQqoFw9-pi5_UxtIkWwv4jTfWLbDwFIs&e=>
>> <
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.orrtiz.com&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=rey8QvJVsx9VER8tfbyqcWeBc3x1dze3BDFEgOry1zo&e=
>> >>>
>> >> OFBiz based solutions & services
>> >>
>> >> OFBiz Extensions Marketplace
>> >>
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__oem.ofbizci.net_oci-2D2_&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=t-3eq7_jE8P3hTlTBYAQB9p_vFHuwoj6RqdbBBr8edI&e=
>> >>
>> >> On Wed, Feb 3, 2016 at 2:39 AM, Lewis John Mcgibbney <
>> >> lewis.mcgibbney@gmail.com> wrote:
>> >>
>> >>> Hi Karanjeet,
>> >>>
>> >>> A good bunch of work has lready gone into this and it is looking
>> really
>> >>> friggin smart indeed.
>> >>> Interesting to see some many pieces of software come together and
>> result
>> >> in
>> >>> something very easy to interpret.
>> >>> Good work.
>> >>> Lewis
>> >>>
>> >>> On Mon, Feb 1, 2016 at 11:44 PM, <
>> dev-digest-help@community.apache.org>
>> >>> wrote:
>> >>>
>> >>>> Hello Everyone,
>> >>>>
>> >>>> With great pleasure, I would like to introduce DRAT (Distributed
>> >> Release
>> >>>> Audit Tool) which is a distributed, parallelized wrapper around
>> Apache
>> >>> RAT
>> >>>> to inspect for appropriate open source licensing in software
>> projects.
>> >>>> DRAT was started by my advisor, Chris Mattmann, in an effort to
get
>> RAT
>> >>>> working on a ver large code base. RAT uses Apache OODT, Apache Tika,
>> >> and
>> >>>> Apache Solr.
>> >>>>
>> >>>> We are now auditing the complete Apache SVN code base to check for
>> >> proper
>> >>>> licenses. Until now, we have scanned 171 / 191 repositories and
>> >>>> illustrated the statistics for 133 of them through D3 visualization
>> >>>> located at
>> >>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__drat.dyndns.org-3A8080_dratviz&d=CwIBaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=u7neGGUaVmQKNSLUqJ9zpA&m=I4VmXy1BbrwbVZc9758zYzQ1Vg_gsve4ety_zu60Z7o&s=EiqoixInVvAF49_1n7AxSu4q_q7BYMJ53JbVnf7rWK4&e=
>> >>>>
>> >>>> Projects should check out the MIME analysis of the code base and
>> click
>> >>>> around. Please also note due to the sheer size of the Apache code
>> bases
>> >>>> and the fact that we scanned and included all revisions in the Apache
>> >> SVN
>> >>>> repo, DRAT is not running in real time. We are running DRAT on the
>> NSF
>> >>>> Super Computer Wrangler, which has a petabyte of flash storage and
>> the
>> >>>> ability to stand up Hadoop and Spark clusters. We are also working
>> on a
>> >>>> paper describing our results.
>> >>>>
>> >>>> Please send feedback to myself (Karanjeet Singh <karanjes@usc.edu>),
>> >>>> Professor Mattmann <mattmann@usc.edu> and/or
>> >> irds-L@mymaillists.usc.edu.
>> >>>>
>> >>>> Thanks & Regards,
>> >>>> Karanjeet Singh
>> >>>> C.S. Graduate Student
>> >>>> University of Southern California
>> >>>> karanjes@usc.edu | +1-213-675-9583
>> >>>
>> >>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message