lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cutt...@apache.org
Subject cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis Token.java
Date Mon, 05 Aug 2002 17:39:04 GMT
cutting     2002/08/05 10:39:03

  Modified:    .        CHANGES.txt
               src/java/org/apache/lucene/analysis Token.java
  Log:
  Improved documentation.
  
  Revision  Changes    Path
  1.29      +16 -1     jakarta-lucene/CHANGES.txt
  
  Index: CHANGES.txt
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/CHANGES.txt,v
  retrieving revision 1.28
  retrieving revision 1.29
  diff -u -r1.28 -r1.29
  --- CHANGES.txt	29 Jul 2002 19:11:14 -0000	1.28
  +++ CHANGES.txt	5 Aug 2002 17:39:03 -0000	1.29
  @@ -58,6 +58,21 @@
        for longer fields.  Once the index is re-created, scores will be
        as before. (cutting)
   
  + 13. Added new method Token.setPositionIncrement().
  +
  +     This permits, for the purpose of phrase searching, placing
  +     multiple terms in a single position.  This is useful with
  +     stemmers that produce multiple possible stems for a word.
  +
  +     This also permits the introduction of gaps between terms, so that
  +     terms which are adjacent in a token stream will not be matched by
  +     and exact phrase query.  This makes it possible, e.g., to build
  +     an analyzer where phrases are not matched over stop words which
  +     have been removed.
  +
  +     Finally, repeating a token with an increment of zero can also be
  +     used to boost scores of matches on that token.
  +
   
   1.2 RC6
   
  
  
  
  1.3       +13 -9     jakarta-lucene/src/java/org/apache/lucene/analysis/Token.java
  
  Index: Token.java
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/analysis/Token.java,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- Token.java	5 Aug 2002 17:14:59 -0000	1.2
  +++ Token.java	5 Aug 2002 17:39:03 -0000	1.3
  @@ -54,6 +54,8 @@
    * <http://www.apache.org/>.
    */
   
  +import org.apache.lucene.index.TermPositions;
  +
   /** A Token is an occurence of a term from the text of a field.  It consists of
     a term's text, the start and end offset of the term in the text of the field,
     and a type string.
  @@ -98,19 +100,21 @@
      *
      * <p>The default value is one.
      *
  -   * <p>Two common uses for this are:<ul>
  +   * <p>Some common uses for this are:<ul>
      *
      * <li>Set it to zero to put multiple terms in the same position.  This is
  -   * useful if, e.g., when a word has multiple stems.  This way searches for
  -   * phrases including either stem will match this occurence.  In this case,
  -   * all but the first stem's increment should be set to zero: the increment of
  -   * the first instance should be one.
  +   * useful if, e.g., a word has multiple stems.  Searches for phrases
  +   * including either stem will match.  In this case, all but the first stem's
  +   * increment should be set to zero: the increment of the first instance
  +   * should be one.  Repeating a token with an increment of zero can also be
  +   * used to boost the scores of matches on that token.
      *
      * <li>Set it to values greater than one to inhibit exact phrase matches.
  -   * If, for example, one does not want phrases to match across stop words,
  -   * then one could build a stop word filter that removes stop words and also
  -   * sets the increment to the number of stop words removed before each
  -   * non-stop word.
  +   * If, for example, one does not want phrases to match across removed stop
  +   * words, then one could build a stop word filter that removes stop words and
  +   * also sets the increment to the number of stop words removed before each
  +   * non-stop word.  Then exact phrase queries will only match when the terms
  +   * occur with no intervening stop words.
      *
      * </ul>
      * @see TermPositions
  
  
  

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message