creadur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Fischer <>
Subject Re: Handling of binary files
Date Thu, 23 Aug 2012 07:00:39 GMT
On 22.08.2012 22:26, Robert Burrell Donkin wrote:
> On 08/22/12 09:01, Andre Fischer wrote:
>> Hi,
> Hi Andre
>> I am working with the Apache OpenOffice (Incubating) project. We are
>> using Rat to find potential license problems.
> Cool. You might also be interested in taking a look at Whisker (once
> it's released), a tool aimed at making it easier to maintain accurate
> legal documents.
> <snip>
>> I am currently working on better integrating Rat into our build process
>> and I have a question regarding the handling of binary files. Rat is
>> started from an Ant script, so file exclusion is handled by Ant. Some of
>> our files are binary, eg OpenOffice text documents that are used for
>> testing, zip files, jar files, images. If such a binary is reported by
>> Rat then its content (well, a part of it) is written to the output xml
>> file. This causes an error in the xslt transformation that we use to
>> transform the output xml to html.
>> Now my question. Is there a way mark certain files as binary so that
>> they are still reported but without including their content?
> I'm not sure anything springs to mind (hopefully someone will jump in if
> I'm mistaken) but this is something Rat should support and I'd be happy
> to help develop a solution.

Thanks, I may take you up on your offer.

> When Rat runs, it classifies documents and processes them based on the
> category. Those it classifies as 'binary' are not checked for license
> headers but are included in the report.
> It would be reasonable to allow additional inclusion and exclusion regex
> matchers to allow 'binary' diagnosis to be customised.
> Would this be good enough?


> Rat hard codes a list of binary files, so if you have give us a list of
> problematic file extensions and types we might be able to add some.

I would prefer the dynamic solution you outlined above.

> As far as dumping binary data out and stuffing up the XSLT, it's
> probably unwise for Rat to do this in any case. Unless Rat starts
> supporting encodings more fully, probably wise to sniff out
> ASCII-nature. Not sure how smart this would need to be...
> Opinions? Objections? Alternatives? Improvements?

If I find the time I will look at the Rat source code and see if I can 
come up with a solution.


View raw message