lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-1622) Multi-word synonym filter (synonym expansion at indexing time).
Date Mon, 24 May 2010 15:06:27 GMT


Uwe Schindler commented on LUCENE-1622:

In my opinion, we should also have a very simply and user-friendly QP like Google: no syntax
at all. Just tokenize Text with Analyzer and create a TermQuery for each token. The only params
to this QP are field name and default Occur enum.

People should create always ranges and so on programatically. Having this in a query parser
is stupid. XMLQueryParser is good for this, or maybe we also get a JSON query parser (I have
plans to create one similar to XML Query Parser, maybe using the saem builders). Mark Miller
was talking about this for solr, too.

> Multi-word synonym filter (synonym expansion at indexing time).
> ---------------------------------------------------------------
>                 Key: LUCENE-1622
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: synonyms.patch
> It would be useful to have a filter that provides support for indexing-time synonym expansion,
especially for multi-word synonyms (with multi-word matching for original tokens).
> The problem is not trivial, as observed on the mailing list. The problems I was able
to identify (mentioned in the unit tests as well):
> - if multi-word synonyms are indexed together with the original token stream (at overlapping
positions), then a query for a partial synonym sequence (e.g., "big" in the synonym "big apple"
for "new york city") causes the document to match;
> - there are problems with highlighting the original document when synonym is matched
(see unit tests for an example),
> - if the synonym is of different length than the original sequence of tokens to be matched,
then phrase queries spanning the synonym and the original sequence boundary won't be found.
Example "big apple" synonym for "new york city". A phrase query "big apple restaurants" won't
match "new york city restaurants".
> I am posting the patch that implements phrase synonyms as a token filter. This is not
necessarily intended for immediate inclusion, but may provide a basis for many people to experiment
and adjust to their own scenarios.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message