lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Chyla <roman.ch...@gmail.com>
Subject Re: Searching w/explicit Multi-Word Synonym Expansion
Date Wed, 17 Jul 2013 16:13:55 GMT
As I don't see in the heads of the users, I can make different assumptions
- but OK, seems reasonable that only minority of users here are actually
willing to do more (btw, I've received coding advice in the past here in
this list). I am working under the assumption that Lucene/SOLR devs are
swamped (there are always more requests and many unclosed JIRA issues), so
where else do they get helping hand than from users of this list? Users
like me, for example.

roman


On Wed, Jul 17, 2013 at 11:59 AM, Jack Krupansky <jack@basetechnology.com>wrote:

> Remember, this is the "users" list, not the "dev" list. Users want to know
> what they can do and use off the shelf today, not what "could" be
> developed. Hopefully, the situation will be brighter in six months or a
> year, but today... is today, not tomorrow.
>
> (And, in fact, users can use LucidWorks Search for query-time phrase
> synonyms, off-the-shelf, today, no patches required.)
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Roman Chyla
> Sent: Wednesday, July 17, 2013 11:44 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Searching w/explicit Multi-Word Synonym Expansion
>
> OK, let's do a simple test instead of making claims - take your solr
> instance, anything bigger or equal to version 4.0
>
> In your schema.xml, pick a field and add the synonym filter
>
> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
>                    ignoreCase="true" expand="true"
> tokenizerFactory="solr.**KeywordTokenizerFactory" />
>
> in your synonyms.txt, add these entries:
>
> hubble\0space\0telescope, HST
>
> ATTENTION: the \0 is a null byte, you must be written as null byte! You can
> do it with: python -c "print \"hubble\0space\0telescope,**HST\"" >
> synonyms.txt
>
> send a phrase query q=field:"hubble space telescope"&debugQuery=true
>
> if you have done it right, you will see 'HST' is in the list - this means,
> solr is able to recognize the multi-token synonym! As far as recognition is
> concerned, there is no need for more work on FST.
>
> I have written a big unittest that proves the point (9 months ago,
> LUCENE-4499) making no changes in the way how FST works. What is missing is
> the query parser that can take advantage - another JIRA issue.
>
> I'll repeat my claim now: the solution(s) are there, they solve the problem
> completely - they are not inside one JIRA issue, but they are there. They
> need to be proven wrong, NOT proclaimed incomplete.
>
>
> roman
>
>
> On Wed, Jul 17, 2013 at 10:22 AM, Jack Krupansky <jack@basetechnology.com>
> **wrote:
>
>  To the best of my knowledge, there is no patch or collection of patches
>> which constitutes a "working solution" - just partial solutions.
>>
>> Yes, it is true, there is some FST work underway (active??) that shows
>> promise depending on query parser implementation, but again, this is all a
>> longer-term future, not a "here and now". Maybe in the 5.0 timeframe?
>>
>> I don't want anyone to get the impression that there are off-the-shelf
>> patches that completely solve the synonym phrase problem. Yes, progress is
>> being made, but we're not there yet.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Roman Chyla
>> Sent: Wednesday, July 17, 2013 9:58 AM
>> To: solr-user@lucene.apache.org
>>
>> Subject: Re: Searching w/explicit Multi-Word Synonym Expansion
>>
>> Hi all,
>>
>> What I find very 'sad' is that Lucene/SOLR contain all the necessary
>> components for handling multi-token synonyms; the Finite State Automaton
>> works perfectly for matching these items; the biggest problem is IMO the
>> old query parser which split things on spaces and doesn't know to be
>> smarter.
>>
>> THIS IS A LONG-TIME PROBLEM - THERE EXIST SEVERAL WORKING SOLUTIONS (but
>> none was committed...sigh, we are re-inventing wheel all the time...)
>>
>> LUCENE-1622
>> LUCENE-4381
>> LUCENE-4499
>>
>>
>> The problem of synonym expansion is more difficult becuase of the parsing
>> -
>> the default parsers are not flexible and they split on empty space -
>> recently I have proposed a solution which makes also the multi-token
>> synonym expansion simple
>>
>> this is the ticket:
>> https://issues.apache.org/****jira/browse/LUCENE-5014<https://issues.apache.org/**jira/browse/LUCENE-5014>
>> <https:**//issues.apache.org/jira/**browse/LUCENE-5014<https://issues.apache.org/jira/browse/LUCENE-5014>
>> >
>>
>>
>> that query parser is able to split on spaces, then look back, do the
>> second
>> pass to see whether to expand with synonyms - and even discover different
>> parse paths and construct different queries based on that. if you want to
>> see some complex examples, look at:
>> https://github.com/romanchyla/****montysolr/blob/master/**contrib/**<https://github.com/romanchyla/**montysolr/blob/master/contrib/**>
>> adsabs/src/test/org/apache/****solr/analysis/**
>> TestAdsabsTypeFulltextParsing.****java<https://github.com/**
>> romanchyla/montysolr/blob/**master/contrib/adsabs/src/**
>> test/org/apache/solr/analysis/**TestAdsabsTypeFulltextParsing.**java<https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/apache/solr/analysis/TestAdsabsTypeFulltextParsing.java>
>> >
>>
>> -
>> eg. line 373, 483
>>
>>
>> Lucene/SOLR developers are already doing great work and have much to do -
>> they need help from everybody who is able to apply patch, test it and
>> report back to JIRA.
>>
>> roman
>>
>>
>>
>> On Wed, Jul 17, 2013 at 9:37 AM, dmarini <david.marini.78@gmail.com>
>> wrote:
>>
>>  iorixxx,
>>
>>>
>>> Thanks for pointing me in the direction of the QueryElevation component.
>>> If
>>> it did not require that the target documents be keyed by the unique key
>>> field it would be ideal, but since our Sku field is not the Unique field
>>> (we
>>> have an internal id which serves as the key while this is the client's
>>> key)
>>> it doesn't seem like it will match unless I make a larger scope change.
>>>
>>> Jack,
>>>
>>> I agree that out of the box there hasn't been a generalized solution for
>>> this yet. I guess what I'm looking for is confirmation that I've gone as
>>> far
>>> as I can properly and from this point need to consider using something
>>> like
>>> the HON custom query parser component (which we're leery of using because
>>> from my reading it solves a specific scenario that may overcompensate
>>> what
>>> we're attempting to fix). I would personally rather stay IN solr than add
>>> custom .jar files from around the web if at all possible.
>>>
>>> Thanks for the replies.
>>>
>>> --Dave
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.**nabb**le.com/Searching-w-**<http://nabble.com/Searching-w-**>
>>> explicit-Multi-Word-Synonym-****Expansion-tp4078469p4078610.****html<
>>> http://lucene.472066.n3.**nabble.com/Searching-w-**
>>> explicit-Multi-Word-Synonym-**Expansion-tp4078469p4078610.**html<http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469p4078610.html>
>>> >
>>>
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message