lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nolan Lawson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion
Date Fri, 01 Feb 2013 14:04:13 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568745#comment-13568745
] 

Nolan Lawson commented on SOLR-4381:
------------------------------------

{quote}
Could you specify which private methods in eDisMax you needed to copy/paste? Perhaps we can
look at how to make it more extension friendly?
{quote}
[These lines|https://github.com/healthonnet/hon-lucene-synonyms/blob/master/src/main/java/org/apache/solr/search/SynonymExpandingExtendedDismaxQParserPlugin.java#L494].

{quote}
If this issue is to be seriously pursued as part of edismax, the following should be included
here in JIRA:
{quote}

I don't think it should be included in EDisMax itself.  Extending EDisMax was just a temporary
shortcut I took, but [Jan points out|https://github.com/healthonnet/hon-lucene-synonyms/issues/6]
that the solution itself could be applied outside EDisMax, or even outside Solr.

{quote}
1. A concise summary of the overall approach, with key technical details.
{quote}

Please see [this blog post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/]
for the best explanation.

{quote}
2. A few example queries, both source and the resulting "parsed query". Key test cases, if
you will.
{quote}

Good idea.  [Added to the README.|https://github.com/healthonnet/hon-lucene-synonyms#tweaking-the-results]

{quote}
3. A semi-detailed summary of what the user of the change needs to know, in terms of how to
set it up, manage it, use it, and its precise effects.
{quote} 

[In the README|https://github.com/healthonnet/hon-lucene-synonyms#query-parameters] for now.

{quote}
4. Detail any limitations.
{quote}

Currently handling this in the [Issues page|https://github.com/healthonnet/hon-lucene-synonyms/issues?state=open].
 Otherwise the standard query-time expansion concerns apply: increased delay in query execution,
configuration is in the request parameters instead of the {{schema.xml}}, query becomes bloated
and incomprehensible.  Also potential user confusion on the single "best practice" solution
for synonyms in Solr, since Solr already has a well-documented way of handling synonyms through
the [SynonymFilterFactory|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory].
 As of right now, I assume people will only use my solution if they try the standard solution
and are unsatisfied.

{quote}
4. Specifically what features of the Synonym Filter will be lost by using this approach.
{quote}

As far as I know, none, because [I'm still using the SynonymFilterFactory|https://github.com/healthonnet/hon-lucene-synonyms/blob/master/README.md#step-6]
and it's configurable by the user.

In general, I agree with you that some rapid iteration outside of the Solr core would probably
be a better approach than outright integration.  Please consider my "merge request" withdrawn;
I'll let the code incubate for a bit, and then look into integration later.
                
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.2, 5.0
>
>         Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
caution that index-time synonym expansion should be preferred to query-time synonym expansion,
due to the way multi-word synonyms are treated and how IDF values can be boosted artificially.
But query-time expansion should have huge benefits, given that changes to the synonyms don't
require re-indexing, the index size stays the same, and the IDF values for the documents don't
get permanently altered.
> The proposed solution is to move the synonym expansion logic from the analysis chain
(either query- or index-type) and into a new QueryParser.  See the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is extended,
and synonym expansion is done on-the-fly.  Queries are parsed into a lattice (i.e. all possible
synonym combinations), while individual components of the query are still handled by the EDismaxQParser
itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, so it invites
experimentation and improvement.  And I think it fits in well with the merry band of misfit
query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/]
and [the Github page for the code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently fixes SOLR-3390
(highlighting problems with multi-word synonyms) and LUCENE-4499 (better support for multi-word
synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message