lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Halácsy Péter <halacsy.pe...@axelero.com>
Subject RE: Relevance boosting with the aid of semantic markup
Date Thu, 13 Dec 2001 22:58:55 GMT


> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Thursday, December 06, 2001 5:54 PM
> To: 'Lucene Developers List'
> Subject: RE: Relevance boosting with the aid of semantic markup
> 
> I made a proposal a while back which could also be used to 
> achieve this.  It
> is not the most elegant solution, but a solution nonetheless.
> 
> The proposal was to add a field to Token, as follows:
>   private int positionIncrement = 1;
>   public int getPositionIncrement() { return positionIncrement; }
>   public void setPositionIncrement(int pi) {
>     if (pi < 0)
>       throw IllegalArgumentException("positionIncrment cannot 
> be negative");
>     positionIncrement = pi;
>   }

My question is: would it have any effect on queries? What happens if an
analyzer produces more than one tokens to the same position. 

thanks
peter

> 
> This would be used when indexing to determine a token's 
> position relative to
> the previous token in the stream, for the purposes of phrase 
> searching, as
> in the following diff:
> 
> --- DocumentWriter.java	2001/09/18 16:29:52	1.1.1.1
> +++ DocumentWriter.java	2001/12/06 16:24:34
> @@ -159,7 +159,8 @@
>  	  TokenStream stream = analyzer.tokenStream(fieldName, reader);
>  	  try {
>  	    for (Token t = stream.next(); t != null; t = 
> stream.next()) {
> -	      addPosition(fieldName, t.termText(), position++);
> +	      addPosition(fieldName, t.termText(), position);
> +              position += t.getPositionIncrement();
>  	      if (position > maxFieldLength) break;
>  	    }
> 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message