lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dsmiley <...@git.apache.org>
Subject [GitHub] lucene-solr pull request #384: LUCENE-8332 move CompletionTokenStream to Con...
Date Wed, 30 May 2018 03:25:41 GMT
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/384#discussion_r191633254
  
    --- Diff: lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ConcatenateGraphFilter.java
---
    @@ -31,80 +33,106 @@
     import org.apache.lucene.util.IOUtils;
     import org.apache.lucene.util.IntsRef;
     import org.apache.lucene.util.automaton.Automaton;
    -import org.apache.lucene.util.automaton.FiniteStringsIterator;
     import org.apache.lucene.util.automaton.LimitedFiniteStringsIterator;
     import org.apache.lucene.util.automaton.Operations;
    +import org.apache.lucene.util.automaton.TooComplexToDeterminizeException;
     import org.apache.lucene.util.automaton.Transition;
     import org.apache.lucene.util.fst.Util;
     
    -import static org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_MAX_GRAPH_EXPANSIONS;
    -import static org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_PRESERVE_POSITION_INCREMENTS;
    -import static org.apache.lucene.search.suggest.document.CompletionAnalyzer.DEFAULT_PRESERVE_SEP;
    -import static org.apache.lucene.search.suggest.document.CompletionAnalyzer.SEP_LABEL;
    -
     /**
    - * Token stream which converts a provided token stream to an automaton.
    - * The accepted strings enumeration from the automaton are available through the
    - * {@link org.apache.lucene.analysis.tokenattributes.TermToBytesRefAttribute} attribute
    - * The token stream uses a {@link org.apache.lucene.analysis.tokenattributes.PayloadAttribute}
to store
    - * a completion's payload (see {@link CompletionTokenStream#setPayload(org.apache.lucene.util.BytesRef)})
    + * Concatenates/Joins every incoming token with a separator into one output token for
every path through the
    + * token stream (which is a graph).  In simple cases this yields one token, but in the
presence of any tokens with
    + * a zero positionIncrmeent (e.g. synonyms) it will be more.  This filter uses the token
bytes, position increment,
    + * and position length of the incoming stream.  Other attributes are not used or manipulated.
      *
      * @lucene.experimental
      */
    -public final class CompletionTokenStream extends TokenStream {
    +public final class ConcatenateGraphFilter extends TokenFilter {
    --- End diff --
    
    I do tend to increase the scope a bit once I get my hands on things but my intention is
a better result ("better" being in the eye of the beholder, of course).  I locally caught
that close() issue as well but forgot to mention it.  How about this, unless anyone says to
the contrary, I'll change it to a TokenStream... and change consumption to occur in incrementToken.
 But keep the name; the parent class being an implementation detail.  ConcatenateGraphFilterFactory
producing a ConcatenateGraphFilter that happens to subclass TokenStream directly.  WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message