creadur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <>
Subject Re: Handling of binary files
Date Wed, 22 Aug 2012 20:26:38 GMT
On 08/22/12 09:01, Andre Fischer wrote:
> Hi,

Hi Andre

> I am working with the Apache OpenOffice (Incubating) project. We are
> using Rat to find potential license problems.

Cool. You might also be interested in taking a look at Whisker (once 
it's released), a tool aimed at making it easier to maintain accurate 
legal documents.


> I am currently working on better integrating Rat into our build process
> and I have a question regarding the handling of binary files. Rat is
> started from an Ant script, so file exclusion is handled by Ant. Some of
> our files are binary, eg OpenOffice text documents that are used for
> testing, zip files, jar files, images. If such a binary is reported by
> Rat then its content (well, a part of it) is written to the output xml
> file. This causes an error in the xslt transformation that we use to
> transform the output xml to html.
> Now my question. Is there a way mark certain files as binary so that
> they are still reported but without including their content?

I'm not sure anything springs to mind (hopefully someone will jump in if 
I'm mistaken) but this is something Rat should support and I'd be happy 
to help develop a solution.

When Rat runs, it classifies documents and processes them based on the 
category. Those it classifies as 'binary' are not checked for license 
headers but are included in the report.

It would be reasonable to allow additional inclusion and exclusion regex 
matchers to allow 'binary' diagnosis to be customised.

Would this be good enough?

Rat hard codes a list of binary files, so if you have give us a list of 
problematic file extensions and types we might be able to add some.

As far as dumping binary data out and stuffing up the XSLT, it's 
probably unwise for Rat to do this in any case. Unless Rat starts 
supporting encodings more fully, probably wise to sniff out 
ASCII-nature. Not sure how smart this would need to be...

Opinions? Objections? Alternatives? Improvements?


View raw message