lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierrick Brihaye <pierrick.brih...@culture.gouv.fr>
Subject Re: positional token info
Date Tue, 21 Oct 2003 07:36:14 GMT
Hi,

Erik Hatcher a écrit:

> Is anyone doing anything interesting with the Token.setPositionIncrement 
> during analysis?

I think so :-) Well... my arabic analyzer is based on this functionnality.

The basic idea is to have several tokens at the same position (i.e. 
setPositionIncrement(0)) which are different possible stems for the same 
word.

> But its practically impossible to formulate a Query that can take 
> advantage of this.  A PhraseQuery, because Terms don't have positional 
> info (only the transient tokens)

Correct !

I've made a dirty patch for the QueryParser which is able to handle 
tokens with positionIncrement equal to 0 or 1 (see bug #23307). It still 
needs some work, but it fits my needs :-)

> I certainly see the benefit of putting tokens into zero-increment 
> positions, but are increments of 2 or more at all useful?

Who knows ? I may be interesting  to keep track of the *presence* of 
"empty words", e.g. "[the] sky [is] blue", "[the] sky [is] [really] 
blue", "[the] sky [is] [that] [really] blue". The traditionnal reduction 
to "sky blue" is maybe over-simplistic for some cases...

Well, just an idea.

Cheers,

-- 
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:pierrick.brihaye@culture.gouv.fr


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message