lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: ExtractingRequestHandler - extracted files caching?
Date Tue, 01 Jul 2014 03:21:27 GMT
Under the covers, Tika is used. You can use Tika yourself on the
client side and cache it's output in the database or text file. Then,
send that to Solr instead. Puts less load on Solr as well.

Or you can use atomic update, but then all the primary (not copyField)
fields must be stored="true".

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jul 1, 2014 at 5:55 AM, Gili Nachum <gilinachum@gmail.com> wrote:
> Hello,
>
> I plan to use ExtractingRequestHandler to index binary files text plus app
> metadata (like literal.downloadCount and others) into a single document.
> I expect the app metadata to change much more often than the binary file
> itself. I would hate to have to extract text from the binary file whenever
> I need to re-index the doc because of a metadata change.
> Is there a some extraction caching solution for files content? or some
> other workaround?
>
> Thanks!

Mime
View raw message