lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8332) New ConcatenateGraphTokenStream (move/rename CompletionTokenStream)
Date Sat, 26 May 2018 04:12:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491501#comment-16491501
] 

David Smiley commented on LUCENE-8332:
--------------------------------------

I had TestRandomChains go at it and it uncovered a couple things.

{{org.apache.lucene.analysis.BaseTokenStreamTestCase#checkResetException}} has two checks:
 # ensures incrementToken() fails if reset() wasn't first called.  This was pretty straight-forward
to fix by adding an IllegalStateException throw at the start in ConcatenateGraphFilter.incrementToken.
 # ensures if you forgot to close(), that trying to get the tokenStream again fails.  This
one is tricky.  ConcatenateGraphFilter.reset() will consume the whole tokenStream including
closing it... and it's hard to disagree with that.  It calls toAutomaton which does this,
and there are even some callers of this toAutomaton method in the NRTSuggester which is assuming
it's going to be closed.  I think adding some closed flag isn't enough since when Analyzer.tokenStream()
is called we want it to fail but all that does is set the reader (which throws if it wasn't
closed).  I could make  toAutomaton not close the input but then the callers need to deal
with that; I'm not which path to go or if I'm missing something.  Or maybe just punt and
have TestRandomChains ignore as it's a bit too pedantic here?

> New ConcatenateGraphTokenStream (move/rename CompletionTokenStream)
> -------------------------------------------------------------------
>
>                 Key: LUCENE-8332
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8332
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lets move and rename the CompletionTokenStream in the suggest module into the analysis
module renamed as ConcatenateGraphTokenStream. See comments in LUCENE-8323 leading to this
idea. Such a TokenStream (or TokenFilter?) has several uses:
>  * for the suggest module
>  * by the SolrTextTagger for NER/ERD use cases – SOLR-12376
>  * for doing complete match search efficiently
> It will need a factory – a TokenFilterFactory, even though we don't have a TokenFilter
based subclass of TokenStream.
> It appears there is no back-compat concern in it suddenly disappearing from the suggest
module as it's marked experimental and it only seems to be public now perhaps due to some
technicality (it has package level constructors).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message