lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nolan Lawson (JIRA)" <>
Subject [jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion
Date Fri, 01 Feb 2013 14:04:13 GMT


Nolan Lawson commented on SOLR-4381:

Could you specify which private methods in eDisMax you needed to copy/paste? Perhaps we can
look at how to make it more extension friendly?
[These lines|].

If this issue is to be seriously pursued as part of edismax, the following should be included
here in JIRA:

I don't think it should be included in EDisMax itself.  Extending EDisMax was just a temporary
shortcut I took, but [Jan points out|]
that the solution itself could be applied outside EDisMax, or even outside Solr.

1. A concise summary of the overall approach, with key technical details.

Please see [this blog post|]
for the best explanation.

2. A few example queries, both source and the resulting "parsed query". Key test cases, if
you will.

Good idea.  [Added to the README.|]

3. A semi-detailed summary of what the user of the change needs to know, in terms of how to
set it up, manage it, use it, and its precise effects.

[In the README|] for now.

4. Detail any limitations.

Currently handling this in the [Issues page|].
 Otherwise the standard query-time expansion concerns apply: increased delay in query execution,
configuration is in the request parameters instead of the {{schema.xml}}, query becomes bloated
and incomprehensible.  Also potential user confusion on the single "best practice" solution
for synonyms in Solr, since Solr already has a well-documented way of handling synonyms through
the [SynonymFilterFactory|].
 As of right now, I assume people will only use my solution if they try the standard solution
and are unsatisfied.

4. Specifically what features of the Synonym Filter will be lost by using this approach.

As far as I know, none, because [I'm still using the SynonymFilterFactory|]
and it's configurable by the user.

In general, I agree with you that some rapid iteration outside of the Solr core would probably
be a better approach than outright integration.  Please consider my "merge request" withdrawn;
I'll let the code incubate for a bit, and then look into integration later.
> Query-time multi-word synonym expansion
> ---------------------------------------
>                 Key: SOLR-4381
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.2, 5.0
>         Attachments: SOLR-4381-2.patch, SOLR-4381.patch
> This is an issue that seems to come up perennially.
> The [Solr docs|]
caution that index-time synonym expansion should be preferred to query-time synonym expansion,
due to the way multi-word synonyms are treated and how IDF values can be boosted artificially.
But query-time expansion should have huge benefits, given that changes to the synonyms don't
require re-indexing, the index size stays the same, and the IDF values for the documents don't
get permanently altered.
> The proposed solution is to move the synonym expansion logic from the analysis chain
(either query- or index-type) and into a new QueryParser.  See the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is extended,
and synonym expansion is done on-the-fly.  Queries are parsed into a lattice (i.e. all possible
synonym combinations), while individual components of the query are still handled by the EDismaxQParser
> It's not an ideal solution by any stretch. But it's nice and self-contained, so it invites
experimentation and improvement.  And I think it fits in well with the merry band of misfit
query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog post|]
and [the Github page for the code|].
> At the risk of tooting my own horn, I also think this patch sufficiently fixes SOLR-3390
(highlighting problems with multi-word synonyms) and LUCENE-4499 (better support for multi-word

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message