tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas DiPiazza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2224) Mime magic for OneNote formats
Date Tue, 15 Jan 2019 19:42:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743338#comment-16743338
] 

Nicholas DiPiazza commented on TIKA-2224:
-----------------------------------------

well we can use the c++ executable, or just port the logic over to java and do what it is
doing. 

let's say we did call this c++ executable and it would generate JSON from the OneNote file.
How could we then take that json file and get it parsed into the tika? is there an example
of any parsers that parse something down to json, then parse that json? not sure how it works.


> Mime magic for OneNote formats
> ------------------------------
>
>                 Key: TIKA-2224
>                 URL: https://issues.apache.org/jira/browse/TIKA-2224
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.14
>            Reporter: Nick Burch
>            Priority: Major
>         Attachments: Sample1.json, Sample1.one, note-ssn-test-mmmm.one
>
>
> As raised at http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers,
we don't have any magic for the OneNote formats. Several years ago we dug out the file format
specs (see http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but didn't
have volunteer energy to implement a parser. However, armed with those specs, we should be
able to come up with some mime magic for detection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message