tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-153) Allow passing of files or memory buffers to parsers
Date Tue, 13 Apr 2010 22:27:50 GMT

    [ https://issues.apache.org/jira/browse/TIKA-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856660#action_12856660

Chris A. Mattmann commented on TIKA-153:

The current Tika APIs are already pretty good, and I'd hate to complicate the clean Parser
interface with extra methods for different kinds of inputs. Instead I'm thinking of adding
a TikaInputStream utility class that extends InputStream with methods that allow accessing
the input document as a File.

The TikaInputStream class would have at least the following construtors:

    public TikaInputStream(InputStream stream) { ... }
    public TikaInputStream(File file) { ... }


+100!! :) I could have used this for TIKA-400 since NetCDF expects (and only provides means)
to deal with input as a File. This happens a lot where streaming doesn't make a lot of sense
in data-intensive files with huge memory footprint...


> Allow passing of files or memory buffers to parsers
> ---------------------------------------------------
>                 Key: TIKA-153
>                 URL: https://issues.apache.org/jira/browse/TIKA-153
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Priority: Minor
> Some of our parsers need to be able to go back and forth within a source document, so
need either a file or (for smaller documents) an in-memory buffer that contains the full document.
Currently we use temporary files for such cases, which in some cases means doing an extra
copy of a file before it gets parsed. We should come up with some way for clients to pass
in a file or a memory buffer if one is available.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message