lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: both way synonyms with ManagedSynonymFilterFactory
Date Tue, 01 Mar 2016 22:52:14 GMT
Thanks for reporting!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 1. mar. 2016 kl. 13.31 skrev Bjørn Hjelle <bjorn.hjelle@gmail.com>:
> 
> Thanks a lot for following up on this and creating the patch!
> 
> On Thu, Feb 25, 2016 at 2:49 PM, Jan Høydahl <jan.asf@cominvent.com> wrote:
> 
>> Created https://issues.apache.org/jira/browse/SOLR-8737 to handle this
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 22. feb. 2016 kl. 11.21 skrev Jan Høydahl <jan.asf@cominvent.com>:
>>> 
>>> Hi
>>> 
>>> Did you get any Further with this?
>>> I reproduced your situation with Solr 5.5.
>>> 
>>> Think the issue here is that when the SynonymFilter is created based on
>> the managed map, option “expand” is always set to “false”, while the
>> default for file-based synonym dictionary is “true”.
>>> 
>>> So with expand=false, what happens is that the input word (e.g. “mb”) is
>> *replaced* with the synonym “megabytes”. Confusingly enough, when synonyms
>> are applied both on index and query side, your document will contain
>> “megabytes” instead of “mb”, but when you query for “mb”, the same happens
>> on query side, so you will actually match :-)
>>> 
>>> I think what we need is to switch default to expand=true, and make it
>> configurable also in the managed factory.
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>>> 11. feb. 2016 kl. 10.16 skrev Bjørn Hjelle <bjorn.hjelle@gmail.com>:
>>>> 
>>>> Hi,
>>>> 
>>>> one-way managed synonyms seems to work fine, but I cannot make both-way
>>>> synonyms work.
>>>> 
>>>> Steps to reproduce with Solr 5.4.1:
>>>> 
>>>> 1. create a core:
>>>> $ bin/solr create_core -c test -d server/solr/configsets/basic_configs
>>>> 
>>>> 2. edit schema.xml so fieldType text_general looks like this:
>>>> 
>>>>  <fieldType name="text_general" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>>    <analyzer>
>>>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>      <filter class="solr.ManagedSynonymFilterFactory" managed="english"
>>>> />
>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>    </analyzer>
>>>>  </fieldType>
>>>> 
>>>> 3. reload the core:
>>>> 
>>>> $ curl -X GET "
>>>> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test"
>>>> 
>>>> 4. add synonyms, one one-way synonym, one two-way, reload the core
>> again:
>>>> 
>>>> $ curl -X PUT -H 'Content-type:application/json' --data-binary
>>>> '{"mad":["angry","upset"]}' "
>>>> http://localhost:8983/solr/test/schema/analysis/synonyms/english"
>>>> $ curl -X PUT -H 'Content-type:application/json' --data-binary
>>>> '["mb","megabytes"]' "
>>>> http://localhost:8983/solr/test/schema/analysis/synonyms/english"
>>>> $ curl -X GET "
>>>> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test"
>>>> 
>>>> 5. list the synonyms:
>>>> {
>>>> "responseHeader":{
>>>>  "status":0,
>>>>  "QTime":0},
>>>> "synonymMappings":{
>>>>  "initArgs":{"ignoreCase":false},
>>>>  "initializedOn":"2016-02-11T09:00:50.354Z",
>>>>  "managedMap":{
>>>>    "mad":["angry",
>>>>      "upset"],
>>>>    "mb":["megabytes"],
>>>>    "megabytes":["mb"]}}}
>>>> 
>>>> 
>>>> 6. add two documents:
>>>> 
>>>> $ bin/post -c test -type 'application/json' -d '[{"id" : "1", "title_t"
>> :
>>>> "10 megabytes makes me mad" },{"id" : "2", "title_t" : "100 mb should be
>>>> sufficient" }]'
>>>> $ bin/post -c test -type 'application/json' -d '[{"id" : "2", "title_t"
>> :
>>>> "100 mb should be sufficient" }]'
>>>> 
>>>> 7. search for the documents:
>>>> 
>>>> - all these return the first document, so one-way synonyms work:
>>>> $ curl -X GET "
>>>> http://localhost:8983/solr/test/select?q=title_t:angry&indent=true"
>>>> $ curl -X GET "
>>>> http://localhost:8983/solr/test/select?q=title_t:upset&indent=true"
>>>> $ curl -X GET "
>>>> http://localhost:8983/solr/test/select?q=title_t:mad&indent=true"
>>>> 
>>>> - this only returns the document with "mb":
>>>> 
>>>> $ curl -X GET "
>>>> http://localhost:8983/solr/test/select?q=title_t:mb&indent=true"
>>>> 
>>>> - this only returns the document with "megabytes"
>>>> 
>>>> $ curl -X GET "
>>>> http://localhost:8983/solr/test/select?q=title_t:megabytes&indent=true"
>>>> 
>>>> 
>>>> Any input on how to make this work would be appreciated.
>>>> 
>>>> Thanks,
>>>> Bjørn
>>> 
>> 
>> 


Mime
View raw message