tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: Mime type identification of plain text files.
Date Mon, 04 Aug 2008 07:57:41 GMT

2008/8/2 Antoni Mylka <antoni.mylka@gmail.com>:
> Many binary formats begin with magic byte sequences composed of ASCII
> characters, e.g.
> zipfiles begin with PK
> pdfs begin with %PDF-
> chms help files begin with ITSF
> etc.
> Does tika make any attempt to distinguish normal txt ASCII documents
> that happen do begin with 'PK' from zip files?

Not at the moment, but it probably should... I created an improvement
issue for that, TIKA-154.


Jukka Zitting

View raw message