incubator-tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antoni Mylka" <antoni.my...@gmail.com>
Subject Mime type identification of plain text files.
Date Sat, 02 Aug 2008 11:55:24 GMT
Many binary formats begin with magic byte sequences composed of ASCII
characters, e.g.
zipfiles begin with PK
pdfs begin with %PDF-
chms help files begin with ITSF
etc.

Does tika make any attempt to distinguish normal txt ASCII documents
that happen do begin with 'PK' from zip files?

-- 
Antoni Myłka
antoni.mylka@gmail.com
Mime
View raw message