lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: ExtractingRequestHandler - extracted files caching?
Date Tue, 01 Jul 2014 04:16:12 GMT
Here's an example of what Alexandre is
talking about:
http://searchhub.org/2012/02/14/indexing-with-solrj/

It mixes database fetching in with the
Tika processing, but that should be pretty easy
to pull out.

Best,
Erick

On Mon, Jun 30, 2014 at 8:21 PM, Alexandre Rafalovitch
<arafalov@gmail.com> wrote:
> Under the covers, Tika is used. You can use Tika yourself on the
> client side and cache it's output in the database or text file. Then,
> send that to Solr instead. Puts less load on Solr as well.
>
> Or you can use atomic update, but then all the primary (not copyField)
> fields must be stored="true".
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
>
>
> On Tue, Jul 1, 2014 at 5:55 AM, Gili Nachum <gilinachum@gmail.com> wrote:
>> Hello,
>>
>> I plan to use ExtractingRequestHandler to index binary files text plus app
>> metadata (like literal.downloadCount and others) into a single document.
>> I expect the app metadata to change much more often than the binary file
>> itself. I would hate to have to extract text from the binary file whenever
>> I need to re-index the doc because of a metadata change.
>> Is there a some extraction caching solution for files content? or some
>> other workaround?
>>
>> Thanks!

Mime
View raw message