creadur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <>
Subject [jira] [Reopened] (RAT-81) MalformedInputException thrown when RAT tries reading file
Date Tue, 16 Sep 2014 13:25:34 GMT


Sebb reopened RAT-81:
      Assignee:     (was: Stefan Bodewig)

It does not seem right to me to mark XML files with invalid contents as binary.
Binary implies that the file does not need a license, but that is not the case.

Such files should still have a valid license (unless excluded), so RAT should report the file
as unreadable or similar.

> MalformedInputException thrown when RAT tries reading file
> ----------------------------------------------------------
>                 Key: RAT-81
>                 URL:
>             Project: Apache Rat
>          Issue Type: Bug
>          Components: engine
>    Affects Versions: 0.6, 0.7
>         Environment: Linux (Ubuntu) on x86, running with "default" file encoding set
to UTF-8
>            Reporter: Marshall Schor
>            Priority: Minor
>             Fix For: 0.8
> To reproduce, set the platform default locale to something that indicates UTF-8 file
> This causes code in (for example) org.apache.rat.document.impl.FileDocument which return
FileReader to set up RAT to use a reader which is using the platform default character encoding
(in this case UTF-8).
> If the file being processed is not encoded in this , it is possible that the reader will
read some data which is "invalid" UTF-8 encodings, which causes the reader to throw a MalformedInputException
> One case we found:
> The file being examined had invalid UTF-8 encodings.  First, Rat ran the BinaryGuesser
- but that returned false because it attempted to read the first 100 or so chars, and got
a "MalformedInputException" instead, so the try/catch block just ended up returning "false"
(not binary).  Then the HeaderChecker tried to read the file to check the header, and got
this same exception - but this time, it made RAT fail.
> Here's the last part of the stack trace:
> Caused by: Analysis failed
>     at
>     at
>     at
>     at
>     at
>     ... 23 more
> Caused by: org.apache.rat.document.RatDocumentAnalysisException: Cannot analyse header
>     at
>     at org.apache.rat.document.impl.util.DocumentAnalyserMultiplexer.analyse(
>     at org.apache.rat.document.impl.util.ConditionalAnalyser.matches(
>     at org.apache.rat.document.impl.util.ConditionalAnalyser.analyse(
>     at
>     ... 27 more
> Caused by: org.apache.rat.analysis.RatHeaderAnalysisException: Cannot read header for
>     at
>     at
>     ... 31 more
> Caused by:
>     at
>     at sun.nio.cs.StreamDecoder$ConverterSD.convertInto(
>     at sun.nio.cs.StreamDecoder$ConverterSD.implRead(
>     at
>     at
>     at
>     at
>     at
>     at
>     at
>     ... 32 more 
> Work-around: mark these files for explicit exclusion.
> Fix: change the binaryguesser to read the files in binary (not assuming any character
coding) and operate with that data.

This message was sent by Atlassian JIRA

View raw message