lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Reusable tokenstream
Date Wed, 22 Nov 2017 11:33:11 GMT
Hi Roxana,
I think you can use https://lucene.apache.org/core/5_4_0/analyzers-common/org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html
<https://lucene.apache.org/core/5_4_0/analyzers-common/org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html>
like suggested earlier.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 22 Nov 2017, at 11:43, Roxana Danger <roxana.danger@gmail.com> wrote:
> 
> Hi Emir,
> Many thanks for your reply.
> The UpdateProcessor can do this work, but is analyzer.reusableTokenStream
> <https://lucene.apache.org/core/3_0_3/api/core/org/apache/lucene/analysis/Analyzer.html#reusableTokenStream(java.lang.String,
> java.io.Reader)> the way to obtain a previous generated tokenstream? is it
> guarantee to get access to the token stream and not reconstruct it?
> Thanks,
> Roxana
> 
> 
> On Wed, Nov 22, 2017 at 10:26 AM, Emir Arnautović <
> emir.arnautovic@sematext.com> wrote:
> 
>> Hi Roxana,
>> I don’t think that it is possible. In some cases (seems like yours is good
>> fit) you could create custom update request processor that would do the
>> shared analysis (you can have it defined in schema) and after analysis use
>> those tokens to create new values for those two fields and remove source
>> value (or flag it as ignored in schema).
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 22 Nov 2017, at 11:09, Roxana Danger <roxana.danger@gmail.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> I would like to reuse the tokenstream generated for one field, to create
>> a
>>> new tokenstream (adding a few filters to the available tokenstream), for
>>> another field without the need of executing again the whole analysis.
>>> 
>>> The particular application is:
>>> - I have field *tokens* that uses an analyzer that generate the tokens
>> (and
>>> maintains the token type attributes)
>>> - I would like to have another two new fields: *verbs* and *adjectives*.
>>> These should reuse the tokenstream generated for the field *tokens* and
>>> filter the verbs and adjectives for the respective fields.
>>> 
>>> Is this feasible? How should it be implemented?
>>> 
>>> Many thanks.
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message