tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 122jxgcn <ywpar...@gmail.com>
Subject Detecting content type with file extension
Date Tue, 07 Aug 2012 09:02:45 GMT
Hello,
I'm having a trouble with auto detection of custom content type.

I have done some debugging and found out some sequences.
So inside the test, when AutoDetectParser() gets called, detector tries to
detect MediaType.
And inside the detect function of MimeTypes.java (still not sure if this is
the only detect function getting called)
Magic prefix determines the content type with error (as
application/x-tika-msoffice not application/x-hwp)
and it get not changed as Metadata.RESOURCE_NAME_KEY and
Metadata.CONTENT_TYPE is empty.
So my custom file's type is declared as application/x-tika-msoffice so my
custom parser never get called.
(fails to get type based on resourceName and metadata hint)

First problem would be the Magic determining the content type wrongly,
but I think the bigger problem is the Metadata is empty when it's passed to
the AutoDetectParser.
I'm not sure what I'm doing wrong in here.
I'm pretty sure I listed my custom parser and custom MIME stuff by following
http://tika.apache.org/1.2/parser_guide.html#List_the_new_parser this link.

I'd really like to tell Tika to use custom parser by figuring out the file's
extension.
I'm not really sure where that logic is happening, but I'm pretty sure it's
possible
as I declared global pattern *.hwp inside tika-mimetypes.xml



--
View this message in context: http://lucene.472066.n3.nabble.com/Detecting-content-type-with-file-extension-tp3999546.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Mime
View raw message