incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "DRATProposal" by ChrisMattmann
Date Thu, 10 Aug 2017 14:08:09 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "DRATProposal" page has been changed by ChrisMattmann:
https://wiki.apache.org/incubator/DRATProposal?action=diff&rev1=22&rev2=23

Comment:
- fix TMs

  
  == Abstract ==
  
- Apache Distributed Release Audit Tool (DRAT) is a distributed, parallelized (Map Reduce)
wrapper around Apache™ RAT to allow it to complete on large code repositories of multiple
file types where Apache™ RAT hangs forever.
+ Apache Distributed Release Audit Tool (DRAT) is a distributed, parallelized (Map Reduce)
wrapper around Apache RAT™ to allow it to complete on large code repositories of multiple
file types where Apache™ RAT hangs forever.
  
  == Proposal ==
  
- Apache DRAT is a distributed, parallelized (Map Reduce) wrapper around Apache™ RAT (Release
Audit Tool). RAT is used to check for proper licensing in software projects. However, RAT
takes a prohibitively long time to analyze large repositories of code, since it can only run
on one JVM. Furthermore, RAT isn't customizable by file type or file size and provides no
incremental output. This wrapper dramatically speeds up the process by leveraging Apache™
OODT to parallelize and workflow the following components:
+ Apache DRAT is a distributed, parallelized (Map Reduce) wrapper around Apache RAT™ (Release
Audit Tool). RAT is used to check for proper licensing in software projects. However, RAT
takes a prohibitively long time to analyze large repositories of code, since it can only run
on one JVM. Furthermore, RAT isn't customizable by file type or file size and provides no
incremental output. This wrapper dramatically speeds up the process by leveraging Apache OODT™
to parallelize and workflow the following components:
  
-  * Apache™ Solr based exploration of a CM repository (e.g., Git, SVN, etc.) and classification
of that repository based on MIME type using Apache™ Tika.
+  * Apache Solr™ based exploration of a CM repository (e.g., Git, SVN, etc.) and classification
of that repository based on MIME type using Apache Tika™.
-  * A MIME partitioner that uses Apache™ Tika to automatically deduce and classify by file
type and then partition Apache™ RAT jobs based on sets of 100 files per type (configurable)
-- the M/R "partitioner"
+  * A MIME partitioner that uses Apache Tika™ to automatically deduce and classify by file
type and then partition Apache™ RAT jobs based on sets of 100 files per type (configurable)
-- the M/R "partitioner"
-  * A throttle wrapper for RAT to MIME targeted Apache™ RAT. -- the M/R "mapper"
+  * A throttle wrapper for RAT to MIME targeted Apache RAT™. -- the M/R "mapper"
   * A reducer to "combine" the produced RAT logs together into a global RAT report that can
be used for stats generation. -- the M/R "reducer"
  
  == Background and Rationale ==
  
- As a part of the Apache Software Foundation (ASF) project, Apache Creadur, a Release Audit
Tool (RAT) was developed especially in response to demand from the Apache Software Foundation
and its hundreds of projects to provide a capability for release auditing that could be integrated
into projects. The primary function of the RAT is automated code auditing and open-source
license analysis focusing on headers. RAT is a natural language processing tool written in
Java to easily run on any platform and to audit code from many source languages (e.g., C,
C++, Java, Python, etc.). RAT can also be used to add license headers to codes that are not
licensed.
+ As a part of the Apache Software Foundation (ASF) project, Apache Creadur™, a Release
Audit Tool (RAT) was developed especially in response to demand from the Apache Software Foundation
and its hundreds of projects to provide a capability for release auditing that could be integrated
into projects. The primary function of the RAT is automated code auditing and open-source
license analysis focusing on headers. RAT is a natural language processing tool written in
Java to easily run on any platform and to audit code from many source languages (e.g., C,
C++, Java, Python, etc.). RAT can also be used to add license headers to codes that are not
licensed.
  
  In the summer of 2013, our team ran Apache RAT on source code produced from the Defense
Advanced Research Projects Agency (DARPA) XDATA national initiative whose inception coincided
with the 2012 U.S. Presidential Initiative in Big Data. XDATA brought together 24 performers
across academia, private industry and the government to construct analytics, visualizations,
and open source software mash-ups that were transitioned into government projects and to the
defense sector. XDATA produced a large Git repository consisting of ~50,000 files and 10s
of millions of lines of code. DARPA XDATA was launched to build a useful infrastructure for
many government agencies and ultimately is an effort to avoid the traditional government-contractor
software pipeline in which additional contracts are required to reuse and to unlock software
previously funded by the government in other programs.
  All XDATA software is open source and is ingested into [[https://opencatalog.darpa.mil/|DARPA’s
Open Catalog]] that points to outputs of the program including its source code and metrics
on the repository. Because of this, one of core products of XDATA is the internal Git repository.
Since XDATA brought together open source software across multiple performers, having an understanding
of the licenses that the source codes used, and their compatibilities and differences was
extremely important and since there repository was so large, our strategy was to develop an
automated process using Apache RAT.

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message