lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Tika analyzers
Date Wed, 30 Jul 2014 15:08:13 GMT
Hmmm, might a custom update processor do that? In an update
processor, you'd get the binary and be able to do anything at all
you wanted to with that. I'm not quite clear on how the binary
gets through the Tika bits and gets passed in in the first place,
but....

Best,
Erick


On Wed, Jul 30, 2014 at 6:00 AM, Tommaso Teofili <tommaso.teofili@gmail.com>
wrote:

> Hi all,
>
> while SolrCell works nicely when in need of indexing binary documents, I am
> wondering about the possibility of having Lucene / Solr documents that have
> binaries in specific Lucene fields, e.g. title="a nice doc",
> name"blabla.doc", binary="0x1234...".
>
> In that case the "binary" field should have an indexing analyzer which can
> extract the text from the binary and index it.
>
> Would it make sense to create a Tika based analyzer for that purpose?
>
> Regards,
> Tommaso
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message