lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?
Date Thu, 12 Apr 2018 16:24:44 GMT
There's also, of course, tika-server. 😊

No matter the method, it is always best to isolate Tika to its own jvm, vm or m.

-----Original Message-----
From: Charlie Hull [mailto:charlie@flax.co.uk] 
Sent: Monday, April 9, 2018 4:15 PM
To: solr-user@lucene.apache.org
Subject: Re: How to use Tika (Solr Cell) to extract content from HTML document instead of
Solr's MostlyPassthroughHtmlMapper ?

As a bonus here's a Dropwizard Tika wrapper that gives you a Tika web service https://github.com/mattflax/dropwizard-tika-server
written by a colleague of mine at Flax. Hope this is useful.

Cheers

Charlie


Mime
View raw message