tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-400) netCDF Tika Parser
Date Wed, 14 Apr 2010 14:24:51 GMT

    [ https://issues.apache.org/jira/browse/TIKA-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856910#action_12856910

Chris A. Mattmann commented on TIKA-400:

Hey Jukka:

I think we do since the NetCDF lib relies on it. I agree with you on accessing internal resources.
The problem is, this NetCDF library (which seems to be the most used/maintained from a Java
perspective), expects to be responsible for handling the way content is delivered to it too.
In fact, NetCDF and HDF concern themselves not only with obtaining data from a particular
stream/content, but also, how that content is represented, because the data volumes are so
large, they have to make optimizations in how to extract and represent the data for the purposes
of access to it.

So, I actually ran into something similar here in terms of e.g., the core abstraction for
opening up a NetCdfFile in the lib is only a File as input -- it's really hard to pass it
a stream, which is what Tika expects. Arg! Very frustrating indeed. I'll look around and see
if there is another ASL friendly NetCDF Java library (does anyone else know of one?)


> netCDF Tika Parser
> ------------------
>                 Key: TIKA-400
>                 URL: https://issues.apache.org/jira/browse/TIKA-400
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>         Environment: indep. of env.
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.8
> Along with TIKA-399, netCDF is also a widely used scientific data format. I'm going to
throw up a Tika parser that can deal with netCDF.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message