lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zzT <>
Subject RE: Lucene 4.0 tokenstream logic
Date Tue, 16 Jul 2013 19:04:33 GMT
Hi Uwe,

Thanks for your immediate response and sorry for my late reply. I managed to
solve my problem. Your comment was enough to "guide" me in the right

The problem was indeed inside my custom Analyzers/Tokenizers. The key point
here is that createComponents() is called only once while the old
tokenStream was called for each token, right? It seems that I had code that
was executed only for the 1st token so refactoring was needed.

I'll provide an example in case it might help someone else out there. Below
is the design evolution till finally making it work properly in 4.3
 -> CustomAnalyzer.tokenStream()
     return new CustomTokenizer(tokenize(input))
(tokenize function performs some analysis basically on the input)

 -> CustomAnalyzer.createComponents()
     return new TokenStreamComponents(new CustomTokenizer(tokenize(input)))
(tokenize function actually called only once)

 -> CustomAnalyzer.createComponents()
     return new TokenStreamComponents(new CustomTokenizer(), new

where all logic from tokenize(input) has been moved into a separate filter
(CustomFilter) inside

Hope it makes sense!

And yes, of course and lucene analyzers are working just fine, shouldn't
even mention that especially without actually testing against them. 

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message