commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject Re: [COMPRESS] TIFF file identified as TAR
Date Tue, 27 Feb 2018 20:46:22 GMT
On 2018-02-27, Stefan Bodewig wrote:

> On 2018-02-27, Allison, Timothy B. wrote:

>>    On TIKA-2591[0], a user reports that a specific type of TIFF is
>>    being identified as a TAR file.  Is this something we should try to
>>    fix at the Tika level, or is this something that would be better
>>    fixed in COMPRESS?

> TAR auto-detection is, erm, clumsy. But this is due to the format not
> being built for being detected.

> This is how it works right now:

> * read the first candidate header of 512 bytes

> * look at the eight bytes that contain the "ustar" string and the
>   version and verify they look like something we support.

> * verify the checksum of the candidate tar header

Actually I was mis-reading the code. It is either "ustar and version
look good" or "parses as tar header with correct checksum". So the
chance for false positives is bigger.

Unfortunately this has proven necessary to detect all valid TAR
archives: https://issues.apache.org/jira/browse/COMPRESS-117

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message