creadur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Fischer <awf....@gmail.com>
Subject Re: Handling of binary files
Date Thu, 23 Aug 2012 07:00:39 GMT
On 22.08.2012 22:26, Robert Burrell Donkin wrote:
> On 08/22/12 09:01, Andre Fischer wrote:
>> Hi,
>
> Hi Andre
>
>> I am working with the Apache OpenOffice (Incubating) project. We are
>> using Rat to find potential license problems.
>
> Cool. You might also be interested in taking a look at Whisker (once
> it's released), a tool aimed at making it easier to maintain accurate
> legal documents.
>
> <snip>
>
>> I am currently working on better integrating Rat into our build process
>> and I have a question regarding the handling of binary files. Rat is
>> started from an Ant script, so file exclusion is handled by Ant. Some of
>> our files are binary, eg OpenOffice text documents that are used for
>> testing, zip files, jar files, images. If such a binary is reported by
>> Rat then its content (well, a part of it) is written to the output xml
>> file. This causes an error in the xslt transformation that we use to
>> transform the output xml to html.
>>
>> Now my question. Is there a way mark certain files as binary so that
>> they are still reported but without including their content?
>
> I'm not sure anything springs to mind (hopefully someone will jump in if
> I'm mistaken) but this is something Rat should support and I'd be happy
> to help develop a solution.

Thanks, I may take you up on your offer.

>
> When Rat runs, it classifies documents and processes them based on the
> category. Those it classifies as 'binary' are not checked for license
> headers but are included in the report.
>
> It would be reasonable to allow additional inclusion and exclusion regex
> matchers to allow 'binary' diagnosis to be customised.
>
> Would this be good enough?

Absolutely.

>
> Rat hard codes a list of binary files, so if you have give us a list of
> problematic file extensions and types we might be able to add some.

I would prefer the dynamic solution you outlined above.

>
>
> As far as dumping binary data out and stuffing up the XSLT, it's
> probably unwise for Rat to do this in any case. Unless Rat starts
> supporting encodings more fully, probably wise to sniff out
> ASCII-nature. Not sure how smart this would need to be...
>
> Opinions? Objections? Alternatives? Improvements?

If I find the time I will look at the Rat source code and see if I can 
come up with a solution.

-Andre


Mime
View raw message