lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5012) Make graph-based TokenFilters easier
Date Tue, 21 May 2013 20:27:16 GMT


Jack Krupansky commented on LUCENE-5012:

Will this Jira include some test code that query parsers can use so that they can retrieve
the graph for a stream containing multiple multi-term synonyms so that they can then individually
sausage the term sequences as well as generate "OR" operators for string of sausages?

> Make graph-based TokenFilters easier
> ------------------------------------
>                 Key: LUCENE-5012
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5012.patch
> SynonymFilter has two limitations today:
>   * It cannot create positions, so eg dns -> domain name service
>     creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and
>     others).
>   * It cannot consume a graph, so e.g. if you try to apply synonyms
>     after Kuromoji tokenizer I'm not sure what will happen.
> I've thought about how to fix these issues but it's really quite
> difficult with the current PosInc/PosLen graph representation, so I'd
> like to explore an alternative approach.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message