lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: How to implement a GivenCharFilter using incrementToken
Date Wed, 25 Nov 2009 10:05:56 GMT
I do not understand your request completely, maybe you tell us some more
requirements of your implementation.

The example you have given is invalid, as offsets should always refer to the
original position in the source stream, so should be:

a(0,1,1) b(2,3,1) a(6,7,1) c(11,12,1).

The second problem, why extend TokenFilter? The super ctor call is wrong, it
must get an other TokenStream as input. Maybe you want to write an Tokenizer
reading from stream?

Should a,b be always single chars or any term?

If the above offsets are correct and what you want (otherwise it would make
no sense), the following would work without any custom class:

Reader input = StringReader("axb xxa xx c");
NormalizeCahrMap map = new NormalizeCharMap();
Map.add("x"," "); // replace all x by whitespace
input = MappingCharFilter(map,input);
TokenStream stream = new WhitespaceTokenizer(input);

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: KingShooter [mailto:se.dxt@hotmail.com]
> Sent: Wednesday, November 25, 2009 5:16 AM
> To: java-user@lucene.apache.org
> Subject: How to implement a GivenCharFilter using incrementToken
> 
> 
> Hi,
>     I find it is very hard to implement a GivenCharFilter(extends
> TokenFilter)using incrementToken. My requirment is like this: I want to
> analyze a StringReader("axb xxa xx c") to these token[term(startOffset,
> endOffset, posIncre)]:
> a(0,1,1) b(2,3,1) a(4,5,1) c(6,7,1).
>     First I use a WhiteSpaceFilter to filter the token, and then use
> GivenCharFilter( assume filter "x" like above). the problem is: when call
> incrementToken, I get attributeTerm "axb", but this time I need return two
> attribute terms "a" and "b", but how to? because it just return only one
> term.
>    Please help me.
> 
> My code just like this:
> 
> ============================================
> 
> public class GivenCharFilter extends TokenFilter{
> 
>   private char filterChar='x';
>   private TermAttribute termAtt;
>   private OffsetAttribute offsetAtt;
>   private PositionIncrementAttribute posIncrAtt;
> 
>   public GivenCharFilter(){
>      init();
>   }
> 
>   public void init() {
>         termAtt = (TermAttribute) addAttribute(TermAttribute.class);
>         offsetAtt = (OffsetAttribute) addAttribute(OffsetAttribute.class);
>         posIncrAtt = (PositionIncrementAttribute)
> addAttribute(PositionIncrementAttribute.class);
>   }
> 
> 
>   /** Returns the next token in the stream, or null at EOS. */
>   public final boolean incrementToken() throws IOException {
>       //How To?
>   }
> 
> }
> ================================
> 
> Regards,
> Xiaotao Deng
> --
> View this message in context: http://old.nabble.com/How-to-implement-a-
> GivenCharFilter-using-incrementToken-tp26507318p26507318.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message