lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-11721) Isolate Tika and dependencies into separate jvm
Date Mon, 04 Dec 2017 14:43:01 GMT
Tim Allison created SOLR-11721:
----------------------------------

             Summary: Isolate Tika and dependencies into separate jvm
                 Key: SOLR-11721
                 URL: https://issues.apache.org/jira/browse/SOLR-11721
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Tim Allison


Tika should not be run in the same jvm as Solr.  Ever.  

Upgrading Tika and hoping to avoid jar hell, while getting all of the dependencies right manually
is, um, error prone.  See my recent failure: SOLR-11622, for which I apologize profusely.

Running DIH against Tika's unit test documents has been eye-opening. It has revealed some
other version conflict/dependency failures that should have been caught much earlier.

The fix is non-trivial, but we should work towards it.
I see two options:

1. TIKA-2514 -- Our current ForkParser offers a model for a minimal fork process + server
option.  The limitation currently is that all parsers and dependencies must be serializable,
which can be a problem for users adding their own parsers with deps that might not be designed
for serializability.  The proposal there is to rework the ForkParser to use a TIKA_HOME directory
for all dependencies.

2. SOLR-7632 -- use tika-server, but make it seamless and as easy (and secure!) to use as
the current handlers.

Other thoughts, recommendations?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message