lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Bazhenov <dot...@gmail.com>
Subject Re: getting the token position
Date Fri, 11 Jan 2013 00:24:14 GMT
What you are looking for is OffsetAttribute. Also consider the possibility of using ShingleFilter
with position increment > 1 and then filtering tokens containing "_" (underscore). This
will be easier, I guess.

On Jan 11, 2013, at 7:14 AM, Igal @ getRailo.org <igal@getrailo.org> wrote:

> hi all,
> 
> how can I get the Token's Position from the TokenStream / Tokenizer / Analyzer ?  I know
that there's a TokenPositionIncrement Attribute and a TokenPositionLength Attribute, but is
there an easy way to get the token position or do I need to implement my own attribute by
adding one of the attributes mentioned above?
> 
> the reason I need it is that I wrote an implementation of a ShingleFilter which breaks
shingles at punctuations so the tokens [token number one, word two] will create the shingles
[ "token number", "number one", "word two" ] -- but Not [ "one word" ] because of the comma.
 I want it to break shingles at increment gaps as well.
> 
> thanks,
> 
> 
> Igal
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---
Denis Bazhenov <dotsid@gmail.com>






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message