tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas DiPiazza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2224) Mime magic for OneNote formats
Date Tue, 15 Jan 2019 19:42:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743338#comment-16743338

Nicholas DiPiazza commented on TIKA-2224:

well we can use the c++ executable, or just port the logic over to java and do what it is

let's say we did call this c++ executable and it would generate JSON from the OneNote file.
How could we then take that json file and get it parsed into the tika? is there an example
of any parsers that parse something down to json, then parse that json? not sure how it works.

> Mime magic for OneNote formats
> ------------------------------
>                 Key: TIKA-2224
>                 URL: https://issues.apache.org/jira/browse/TIKA-2224
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.14
>            Reporter: Nick Burch
>            Priority: Major
>         Attachments: Sample1.json, Sample1.one, note-ssn-test-mmmm.one
> As raised at http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers,
we don't have any magic for the OneNote formats. Several years ago we dug out the file format
specs (see http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but didn't
have volunteer energy to implement a parser. However, armed with those specs, we should be
able to come up with some mime magic for detection

This message was sent by Atlassian JIRA

View raw message