lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Custom tokenizer
Date Mon, 12 Jan 2015 08:28:21 GMT
> Thanks for the reply.
> 
> Hmm, I understand.
> I know about AnalyzerWrapper, but that is not what I am looking for.
> 
> I also know about cloning and overriding. I want my analyzer to behave
> exactly the same as EnglishAnalyzer and right now I am copying the code
> from the EnglishAnalyzer to mimic the behavior, which is a dirty solution.
> Is there any other proper solution(s) to this problem?

NO.

Analyzers that are provided by Lucene have a configuration (combination of Tokenizers and
Filters) that won't change unless the matchVersion differs (which is documented in the Javadocs).
The reason for this is: If you have indexed with a given analyzer you have to use it unmodified
always when updating/searching the index, otherwise the results of those actions are undefined.
So on updating Lucene every Analyzer should return exactly the same results. Otherwise all
users would need to rebuild their indexes also in minor versions.

Also, see Lucene Analyzers as "example" code. What counts here is the combination of Tokenizers
and TokenFilters, which is freely configureable. The ones provided by Lucene are useful for
common cases, but whenever you have custom requirements, you have to define your Analyzer
*completely* yourself. This is also what Solr and Elasticsearch users do in their config files.

Uwe

> Thank you.
> 
> On Mon, Jan 12, 2015 at 1:36 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> 
> > Hi,
> >
> > Extending an existing Analyzer is not useful, because it is just a
> > factory that returns a TokenStream instance to consumers. If you want
> > to change the Tokenizer of an existing Analyzer, just clone it and
> > rewrite its
> > createComponents() method, see the example in the Javadocs:
> >
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/analysis/A
> > nalyzer.html
> >
> > If you want to add additional TokenFilters to the chain, you can do
> > this with AnalyzerWrapper (
> >
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/analysis/A
> > nalyzerWrapper.html), but this does not work with Tokenizers, because
> > those are instantiated before the TokenFilters which depend on them,
> > so changing the Tokenizer afterwards is impossible.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Vihari Piratla [mailto:viharipiratla@gmail.com]
> > > Sent: Monday, January 12, 2015 8:51 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Custom tokenizer
> > >
> > > Hi,
> > > I am trying to implement a custom tokenizer for my application and I
> > > have few queries regarding the same.
> > > 1. Is there a way to provide an existing analyzer (say
> > > EnglishAnanlyzer)
> > the
> > > custom tokenizer and make it use this tokenizer instead of say
> > > StandardTokenizer?
> > > 2. Why are analyzers such as Standard and EnglishAnalyzers defined final?
> > > Because of which, I cannot extend them.
> > >
> > > Thank you.
> > > --
> > > V
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> --
> V


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message