lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mikemcc...@apache.org
Subject svn commit: r829766 - in /lucene/java/branches/lucene_2_9: ./ src/java/org/apache/lucene/analysis/
Date Mon, 26 Oct 2009 12:23:58 GMT
Author: mikemccand
Date: Mon Oct 26 12:23:58 2009
New Revision: 829766

URL: http://svn.apache.org/viewvc?rev=829766&view=rev
Log:
LUCENE-2008 (on 2.9 branch): improve javadocs for TokenStream/Tokenizer/Token

Modified:
    lucene/java/branches/lucene_2_9/CHANGES.txt
    lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TeeSinkTokenFilter.java
    lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Token.java
    lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenFilter.java
    lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenStream.java
    lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Tokenizer.java

Modified: lucene/java/branches/lucene_2_9/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/CHANGES.txt?rev=829766&r1=829765&r2=829766&view=diff
==============================================================================
--- lucene/java/branches/lucene_2_9/CHANGES.txt (original)
+++ lucene/java/branches/lucene_2_9/CHANGES.txt Mon Oct 26 12:23:58 2009
@@ -55,6 +55,9 @@
  * Fix javadoc about score tracking done by search methods in Searcher 
    and IndexSearcher.  (Mike McCandless)
 
+ * LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
+   (Luke Nezda via Mike McCandless)
+
  ======================= Release 2.9.0 2009-09-23 =======================
 
 Changes in backwards compatibility policy

Modified: lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TeeSinkTokenFilter.java
URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TeeSinkTokenFilter.java?rev=829766&r1=829765&r2=829766&view=diff
==============================================================================
--- lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TeeSinkTokenFilter.java
(original)
+++ lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TeeSinkTokenFilter.java
Mon Oct 26 12:23:58 2009
@@ -53,7 +53,7 @@
 d.add(new Field("f3", final3));
 d.add(new Field("f4", final4));
  * </pre>
- * In this example, <code>sink1</code> and <code>sink2<code> will
both get tokens from both
+ * In this example, <code>sink1</code> and <code>sink2</code> will
both get tokens from both
  * <code>reader1</code> and <code>reader2</code> after whitespace
tokenizer
  * and now we can further wrap any of these in extra analysis, and more "sources" can be
inserted if desired.
  * It is important, that tees are consumed before sinks (in the above example, the field
names must be

Modified: lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Token.java
URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Token.java?rev=829766&r1=829765&r2=829766&view=diff
==============================================================================
--- lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Token.java (original)
+++ lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Token.java Mon Oct
26 12:23:58 2009
@@ -36,7 +36,7 @@
   <p>
   The start and end offsets permit applications to re-associate a token with
   its source text, e.g., to display highlighted query terms in a document
-  browser, or to show matching text fragments in a KWIC (KeyWord In Context)
+  browser, or to show matching text fragments in a <abbr title="KeyWord In Context">KWIC</abbr>
   display, etc.
   <p>
   The type is a string, assigned by a lexical analyzer
@@ -70,9 +70,9 @@
   associated performance cost has been added (below).  The
   {@link #termText()} method has been deprecated.</p>
   
-  <p>Tokenizers and filters should try to re-use a Token
+  <p>Tokenizers and TokenFilters should try to re-use a Token
   instance when possible for best performance, by
-  implementing the {@link TokenStream#next(Token)} API.
+  implementing the {@link TokenStream#incrementToken()} API.
   Failing that, to create a new Token you should first use
   one of the constructors that starts with null text.  To load
   the token from a char[] use {@link #setTermBuffer(char[], int, int)}.
@@ -86,30 +86,30 @@
   set the length of the term text.  See <a target="_top"
   href="https://issues.apache.org/jira/browse/LUCENE-969">LUCENE-969</a>
   for details.</p>
-  <p>Typical reuse patterns:
+  <p>Typical Token reuse patterns:
   <ul>
-  <li> Copying text from a string (type is reset to #DEFAULT_TYPE if not specified):<br/>
+  <li> Copying text from a string (type is reset to {@link #DEFAULT_TYPE} if not specified):<br/>
   <pre>
     return reusableToken.reinit(string, startOffset, endOffset[, type]);
   </pre>
   </li>
-  <li> Copying some text from a string (type is reset to #DEFAULT_TYPE if not specified):<br/>
+  <li> Copying some text from a string (type is reset to {@link #DEFAULT_TYPE} if not
specified):<br/>
   <pre>
     return reusableToken.reinit(string, 0, string.length(), startOffset, endOffset[, type]);
   </pre>
   </li>
   </li>
-  <li> Copying text from char[] buffer (type is reset to #DEFAULT_TYPE if not specified):<br/>
+  <li> Copying text from char[] buffer (type is reset to {@link #DEFAULT_TYPE} if not
specified):<br/>
   <pre>
     return reusableToken.reinit(buffer, 0, buffer.length, startOffset, endOffset[, type]);
   </pre>
   </li>
-  <li> Copying some text from a char[] buffer (type is reset to #DEFAULT_TYPE if not
specified):<br/>
+  <li> Copying some text from a char[] buffer (type is reset to {@link #DEFAULT_TYPE}
if not specified):<br/>
   <pre>
     return reusableToken.reinit(buffer, start, end - start, startOffset, endOffset[, type]);
   </pre>
   </li>
-  <li> Copying from one one Token to another (type is reset to #DEFAULT_TYPE if not
specified):<br/>
+  <li> Copying from one one Token to another (type is reset to {@link #DEFAULT_TYPE}
if not specified):<br/>
   <pre>
     return reusableToken.reinit(source.termBuffer(), 0, source.termLength(), source.startOffset(),
source.endOffset()[, source.type()]);
   </pre>
@@ -119,7 +119,7 @@
   <ul>
   <li>clear() initializes all of the fields to default values. This was changed in
contrast to Lucene 2.4, but should affect no one.</li>
   <li>Because <code>TokenStreams</code> can be chained, one cannot assume
that the <code>Token's</code> current type is correct.</li>
-  <li>The startOffset and endOffset represent the start and offset in the source text.
So be careful in adjusting them.</li>
+  <li>The startOffset and endOffset represent the start and offset in the source text,
so be careful in adjusting them.</li>
   <li>When caching a reusable token, clone it. When injecting a cached token into a
stream that can be reset, clone it again.</li>
   </ul>
   </p>

Modified: lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenFilter.java
URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenFilter.java?rev=829766&r1=829765&r2=829766&view=diff
==============================================================================
--- lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenFilter.java (original)
+++ lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenFilter.java Mon
Oct 26 12:23:58 2009
@@ -19,15 +19,10 @@
 
 import java.io.IOException;
 
-/** A TokenFilter is a TokenStream whose input is another token stream.
+/** A TokenFilter is a TokenStream whose input is another TokenStream.
   <p>
-  This is an abstract class.
-  NOTE: subclasses must override 
-  {@link #incrementToken()} if the new TokenStream API is used
-  and {@link #next(Token)} or {@link #next()} if the old
-  TokenStream API is used.
-  <p>
-  See {@link TokenStream}
+  This is an abstract class; subclasses must override {@link #incrementToken()}.
+  @see TokenStream
   */
 public abstract class TokenFilter extends TokenStream {
   /** The source of tokens for this filter. */

Modified: lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenStream.java
URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenStream.java?rev=829766&r1=829765&r2=829766&view=diff
==============================================================================
--- lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenStream.java (original)
+++ lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/TokenStream.java Mon
Oct 26 12:23:58 2009
@@ -38,14 +38,14 @@
  * A <code>TokenStream</code> enumerates the sequence of tokens, either from
  * {@link Field}s of a {@link Document} or from query text.
  * <p>
- * This is an abstract class. Concrete subclasses are:
+ * This is an abstract class; concrete subclasses are:
  * <ul>
  * <li>{@link Tokenizer}, a <code>TokenStream</code> whose input is a Reader;
and
  * <li>{@link TokenFilter}, a <code>TokenStream</code> whose input is another
  * <code>TokenStream</code>.
  * </ul>
  * A new <code>TokenStream</code> API has been introduced with Lucene 2.9. This
API
- * has moved from being {@link Token} based to {@link Attribute} based. While
+ * has moved from being {@link Token}-based to {@link Attribute}-based. While
  * {@link Token} still exists in 2.9 as a convenience class, the preferred way
  * to store the information of a {@link Token} is to use {@link AttributeImpl}s.
  * <p>
@@ -61,14 +61,14 @@
  * <li>Instantiation of <code>TokenStream</code>/{@link TokenFilter}s which
add/get
  * attributes to/from the {@link AttributeSource}.
  * <li>The consumer calls {@link TokenStream#reset()}.
- * <li>the consumer retrieves attributes from the stream and stores local
- * references to all attributes it wants to access
- * <li>The consumer calls {@link #incrementToken()} until it returns false and
- * consumes the attributes after each call.
+ * <li>The consumer retrieves attributes from the stream and stores local
+ * references to all attributes it wants to access.
+ * <li>The consumer calls {@link #incrementToken()} until it returns false
+ * consuming the attributes after each call.
  * <li>The consumer calls {@link #end()} so that any end-of-stream operations
  * can be performed.
  * <li>The consumer calls {@link #close()} to release any resource when finished
- * using the <code>TokenStream</code>
+ * using the <code>TokenStream</code>.
  * </ol>
  * To make sure that filters and consumers know which attributes are available,
  * the attributes must be added during instantiation. Filters and consumers are
@@ -78,8 +78,8 @@
  * You can find some example code for the new API in the analysis package level
  * Javadoc.
  * <p>
- * Sometimes it is desirable to capture a current state of a <code>TokenStream</code>
- * , e. g. for buffering purposes (see {@link CachingTokenFilter},
+ * Sometimes it is desirable to capture a current state of a <code>TokenStream</code>,
+ * e.g., for buffering purposes (see {@link CachingTokenFilter},
  * {@link TeeSinkTokenFilter}). For this usecase
  * {@link AttributeSource#captureState} and {@link AttributeSource#restoreState}
  * can be used.
@@ -283,7 +283,7 @@
   }
   
   /**
-   * Consumers (ie {@link IndexWriter}) use this method to advance the stream to
+   * Consumers (i.e., {@link IndexWriter}) use this method to advance the stream to
    * the next token. Implementing classes must implement this method and update
    * the appropriate {@link AttributeImpl}s with the attributes of the next
    * token.

Modified: lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Tokenizer.java
URL: http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Tokenizer.java?rev=829766&r1=829765&r2=829766&view=diff
==============================================================================
--- lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Tokenizer.java (original)
+++ lucene/java/branches/lucene_2_9/src/java/org/apache/lucene/analysis/Tokenizer.java Mon
Oct 26 12:23:58 2009
@@ -24,20 +24,14 @@
 
 /** A Tokenizer is a TokenStream whose input is a Reader.
   <p>
-  This is an abstract class.
-  <p>
-  NOTE: subclasses must override 
-  {@link #incrementToken()} if the new TokenStream API is used
-  and {@link #next(Token)} or {@link #next()} if the old
-  TokenStream API is used.
+  This is an abstract class; subclasses must override {@link #incrementToken()}
   <p>
   NOTE: Subclasses overriding {@link #incrementToken()} must
   call {@link AttributeSource#clearAttributes()} before
   setting attributes.
-  Subclasses overriding {@link #next(Token)} must call
+  Subclasses overriding {@link #incrementToken()} must call
   {@link Token#clear()} before setting Token attributes. 
  */
-
 public abstract class Tokenizer extends TokenStream {
   /** The text source for this Tokenizer. */
   protected Reader input;



Mime
View raw message