lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: intra-word delimiters
Date Tue, 16 Aug 2005 03:53:52 GMT

On Aug 15, 2005, at 7:47 PM, Yonik Seeley wrote:

> That was the plan, but step (4) really seems problematic.
>
> - term expansion this way can lead to a lot of false matches
> - phrase queries with many bordering words break
> - settingt term positions such that phrase queries work on all combos
> of subwords is non-trivial.

Tag every term with its length in tokens.  :)

Index at these positions.

Pos0: a ab abc abcd
Pos1: b bc bcd
Pos2: c cd
Pos3: d

Create a phrase query that when it encounters ab => { tokenlength =>  
2 } knows to look for something at position 3.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message