lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3295) Binaries contain 1.6 classes
Date Fri, 30 Mar 2012 14:28:31 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242376#comment-13242376
] 

Chris A. Mattmann commented on SOLR-3295:
-----------------------------------------

Hi Guys:

Couple of comments.

bq. Thanks for doing the test. I know this already because I hit that, too. Its caused by
TIKA's dependencies. The NetCDF (http://www.unidata.ucar.edu/software/netcdf/) parser is only
compiled with Java 1.6, although TIKA is also only Java 1.5, so this is a TIKA bug.

In Tika, I wouldn't classify this as a bug, since our parser jar dependencies can be excluded
in various ways. It's simply a requirement for folks that are interested in all of the features
that the NetCDF library provides, but if you don't care about parsing those types of files,
you can simply omit that parser and exclude the jar file dependency.

bq. 's obscure, indeed, especially for people outside the climate community. 

Obscure? Sorry, not meaning to argue here, but that's pretty patently untrue. All data formats
are at some level obscure, depending on the community that you work in. The "climate" one
that you are talking about includes a broad range of folks, dealing with remote sensing, climate
modeling, decision making, etc., at some of the highest levels of government, funding, and
other areas, both in the U.S. and internationally. NetCDF, and HDF, OPeNDAP, and other formats
are pretty broadly accepted standards. The use of data from NetCDF for example, resulted in
over 2000+ publications generated as part of the last Intergovernmental Panel on Climate Change
(IPCC) and its 4th assessment report :) So, not sure it's obscure.

bq. he UCAR netcdf library is on the other hand not able to handle streaming file input, so
TIKA loads the whole file into memory

Yep, it's part of the issue of the underlying data file format more so than the actually library
itself. It's because it doesn't support random access and yes the current code I had to bake
into Tika unfortunately must work around it by loading the whole file into memory. Jukka and
I have discussed some better support for this including temporary file support in Tika and
we're working on improving it, but not there yet.

bq. don't really see the use-case for support in Solr

It's up to you guys. If you want to tell users of Solr, "hey you can drop a scientific data
file format onto Solr and magically its metadata will be indexed", then it might be important.
We do this in OODT quite often, and it's one of the core use cases (and we even use Lucene
and Solr for the metadata catalogs :) ).

bq. Loading a 500 Megabyte file into memory just to get the header

A lot of times that header contains the key parameters (spatial and temporal bounds) that
are required to make a decision as to what to do with the file, as well as other met fields
including the remote sensing variables, or climate variables being measured, valid units,
links to publications, etc. So it's more than useless information.

bq. Right, but how many people have these gigabyte climate data files

Depends on who is using it. Like I said, this is pretty much all of the files that I deal
with :), but to each their own. Disabling it in Solr isn't really going to affect me (or others
much) since OODT pretty much does this anyways, but meh.




                
> Binaries contain 1.6 classes
> ----------------------------
>
>                 Key: SOLR-3295
>                 URL: https://issues.apache.org/jira/browse/SOLR-3295
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.6
>
>         Attachments: output.log
>
>
> I've ran this tool (does the job): http://code.google.com/p/versioncheck/ on the checkout
of branch_3x. To my surprise there is a JAR which contains Java 1.6 code:
> {noformat}
> Major.Minor Version : 50.0             JAVA compatibility : Java 1.6 platform: 45.3-50.0
> Number of classes : 60
> Classes are : 
> c:\Work\lucene-solr\.\solr\contrib\extraction\lib\netcdf-4.2-min.jar [:] ucar/unidata/geoloc/Bearing.class
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message