lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: both way synonyms with ManagedSynonymFilterFactory
Date Thu, 25 Feb 2016 13:49:26 GMT
Created https://issues.apache.org/jira/browse/SOLR-8737 to handle this

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 22. feb. 2016 kl. 11.21 skrev Jan Høydahl <jan.asf@cominvent.com>:
> 
> Hi
> 
> Did you get any Further with this?
> I reproduced your situation with Solr 5.5.
> 
> Think the issue here is that when the SynonymFilter is created based on the managed map,
option “expand” is always set to “false”, while the default for file-based synonym
dictionary is “true”.
> 
> So with expand=false, what happens is that the input word (e.g. “mb”) is *replaced*
with the synonym “megabytes”. Confusingly enough, when synonyms are applied both on index
and query side, your document will contain “megabytes” instead of “mb”, but when you
query for “mb”, the same happens on query side, so you will actually match :-)
> 
> I think what we need is to switch default to expand=true, and make it configurable also
in the managed factory.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 11. feb. 2016 kl. 10.16 skrev Bjørn Hjelle <bjorn.hjelle@gmail.com>:
>> 
>> Hi,
>> 
>> one-way managed synonyms seems to work fine, but I cannot make both-way
>> synonyms work.
>> 
>> Steps to reproduce with Solr 5.4.1:
>> 
>> 1. create a core:
>> $ bin/solr create_core -c test -d server/solr/configsets/basic_configs
>> 
>> 2. edit schema.xml so fieldType text_general looks like this:
>> 
>>   <fieldType name="text_general" class="solr.TextField"
>> positionIncrementGap="100">
>>     <analyzer>
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>       <filter class="solr.ManagedSynonymFilterFactory" managed="english"
>> />
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>     </analyzer>
>>   </fieldType>
>> 
>> 3. reload the core:
>> 
>> $ curl -X GET "
>> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test"
>> 
>> 4. add synonyms, one one-way synonym, one two-way, reload the core again:
>> 
>> $ curl -X PUT -H 'Content-type:application/json' --data-binary
>> '{"mad":["angry","upset"]}' "
>> http://localhost:8983/solr/test/schema/analysis/synonyms/english"
>> $ curl -X PUT -H 'Content-type:application/json' --data-binary
>> '["mb","megabytes"]' "
>> http://localhost:8983/solr/test/schema/analysis/synonyms/english"
>> $ curl -X GET "
>> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test"
>> 
>> 5. list the synonyms:
>> {
>> "responseHeader":{
>>   "status":0,
>>   "QTime":0},
>> "synonymMappings":{
>>   "initArgs":{"ignoreCase":false},
>>   "initializedOn":"2016-02-11T09:00:50.354Z",
>>   "managedMap":{
>>     "mad":["angry",
>>       "upset"],
>>     "mb":["megabytes"],
>>     "megabytes":["mb"]}}}
>> 
>> 
>> 6. add two documents:
>> 
>> $ bin/post -c test -type 'application/json' -d '[{"id" : "1", "title_t" :
>> "10 megabytes makes me mad" },{"id" : "2", "title_t" : "100 mb should be
>> sufficient" }]'
>> $ bin/post -c test -type 'application/json' -d '[{"id" : "2", "title_t" :
>> "100 mb should be sufficient" }]'
>> 
>> 7. search for the documents:
>> 
>> - all these return the first document, so one-way synonyms work:
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:angry&indent=true"
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:upset&indent=true"
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:mad&indent=true"
>> 
>> - this only returns the document with "mb":
>> 
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:mb&indent=true"
>> 
>> - this only returns the document with "megabytes"
>> 
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:megabytes&indent=true"
>> 
>> 
>> Any input on how to make this work would be appreciated.
>> 
>> Thanks,
>> Bjørn
> 


Mime
View raw message