lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KingShooter <>
Subject RE: How to implement a GivenCharFilter using incrementToken
Date Thu, 26 Nov 2009 01:04:37 GMT

The example you have given is invalid, as offsets should always refer to the
original position in the source stream, so should be:

a(0,1,1) b(2,3,1) a(6,7,1) c(11,12,1).

Deng: I'm afraid that if (case1) index "axxxb", and then I search "axb" or
(case2) index "axb" and then search "axxxxb", which one would work? So I
just want to omit the destence between filter term.

The second problem, why extend TokenFilter? The super ctor call is wrong, it
must get an other TokenStream as input.
Deng:  You are right:)  Because in my porject, the old version implement it
using "nextToken" and extends TokenFilter. And then I want to rewrite it
using incrementToken method, so I want to know how to achieve it, maybe it's

Should a,b be always single chars or any term?
Deng: a,b,c here I just use single char make it simple, it can be any term.
and also "x" can be a set contain many filter chars(maybe more than 1,000)

If the above offsets are correct and what you want (otherwise it would make
no sense), the following would work without any custom class:

Reader input = StringReader("axb xxa xx c");
NormalizeCahrMap map = new NormalizeCharMap();
Map.add("x"," "); // replace all x by whitespace
input = MappingCharFilter(map,input);
TokenStream stream = new WhitespaceTokenizer(input);

Deng: That's good idea. And I get answer from you. Thank you very much.

Another question is:
     When incrementToken get a token "abc#def#hi", How to preserve current
"def#hi" and then return attrTerm "abc", and next I call incrementToken I
can get "def#hi".
? ("#" stand for token separator, and I want to get 3 tokens(abc, def, hi)
from original token).

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message