lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: both way synonyms with ManagedSynonymFilterFactory
Date Mon, 22 Feb 2016 10:21:10 GMT
Hi

Did you get any Further with this?
I reproduced your situation with Solr 5.5.

Think the issue here is that when the SynonymFilter is created based on the managed map, option
“expand” is always set to “false”, while the default for file-based synonym dictionary
is “true”.

So with expand=false, what happens is that the input word (e.g. “mb”) is *replaced* with
the synonym “megabytes”. Confusingly enough, when synonyms are applied both on index and
query side, your document will contain “megabytes” instead of “mb”, but when you query
for “mb”, the same happens on query side, so you will actually match :-)

I think what we need is to switch default to expand=true, and make it configurable also in
the managed factory.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 11. feb. 2016 kl. 10.16 skrev Bjørn Hjelle <bjorn.hjelle@gmail.com>:
> 
> Hi,
> 
> one-way managed synonyms seems to work fine, but I cannot make both-way
> synonyms work.
> 
> Steps to reproduce with Solr 5.4.1:
> 
> 1. create a core:
> $ bin/solr create_core -c test -d server/solr/configsets/basic_configs
> 
> 2. edit schema.xml so fieldType text_general looks like this:
> 
>    <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ManagedSynonymFilterFactory" managed="english"
> />
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
> 
> 3. reload the core:
> 
> $ curl -X GET "
> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test"
> 
> 4. add synonyms, one one-way synonym, one two-way, reload the core again:
> 
> $ curl -X PUT -H 'Content-type:application/json' --data-binary
> '{"mad":["angry","upset"]}' "
> http://localhost:8983/solr/test/schema/analysis/synonyms/english"
> $ curl -X PUT -H 'Content-type:application/json' --data-binary
> '["mb","megabytes"]' "
> http://localhost:8983/solr/test/schema/analysis/synonyms/english"
> $ curl -X GET "
> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test"
> 
> 5. list the synonyms:
> {
>  "responseHeader":{
>    "status":0,
>    "QTime":0},
>  "synonymMappings":{
>    "initArgs":{"ignoreCase":false},
>    "initializedOn":"2016-02-11T09:00:50.354Z",
>    "managedMap":{
>      "mad":["angry",
>        "upset"],
>      "mb":["megabytes"],
>      "megabytes":["mb"]}}}
> 
> 
> 6. add two documents:
> 
> $ bin/post -c test -type 'application/json' -d '[{"id" : "1", "title_t" :
> "10 megabytes makes me mad" },{"id" : "2", "title_t" : "100 mb should be
> sufficient" }]'
> $ bin/post -c test -type 'application/json' -d '[{"id" : "2", "title_t" :
> "100 mb should be sufficient" }]'
> 
> 7. search for the documents:
> 
> - all these return the first document, so one-way synonyms work:
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:angry&indent=true"
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:upset&indent=true"
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:mad&indent=true"
> 
> - this only returns the document with "mb":
> 
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:mb&indent=true"
> 
> - this only returns the document with "megabytes"
> 
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:megabytes&indent=true"
> 
> 
> Any input on how to make this work would be appreciated.
> 
> Thanks,
> Bjørn


Mime
View raw message