lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felipe Lahti <fla...@thoughtworks.com>
Subject Re: How should I configure Solr to support multi-word synonyms?
Date Sat, 16 Mar 2013 03:50:31 GMT
Hi,

I also have been using that plugin (https://github.com/healthonnet/hon-
lucene-synonyms) in a project and it's been working pretty well. But I
think Solr should handle multi-word synonyms natively (BTW, there is a
story in jira for that https://issues.apache.org/jira/browse/SOLR-4381).
One downside is that project doesn't have any unit tests or component tests
ensuring his functionality so you will need to be more careful and cover it
with tests by yourself.

Best,


On Mon, Mar 4, 2013 at 7:32 PM, Jan Høydahl <jan.asf@cominvent.com> wrote:

> Hi,
>
> I have been using this plugin with success:
> https://github.com/healthonnet/hon-lucene-synonyms
> While it gives you multi-word synonyms, you lose the ability to have
> different synonym dictionaries per field.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 4. mars 2013 kl. 19:40 skrev David Sharpe <
> david.sharpe@seekersolutions.com>:
>
> > Hello Solr mailing list,
> >
> > I have read many posts and run many tests, but still I cannot get
> > multi-word synonyms behaving the way I think they should. I would
> > appreciate your advice.
> >
> > Here is an example of the behaviour I am trying to achieve:
> >
> > *# Given synonyms.txt
> > wordOne, phrase one
> > *
> >
> >
> >   1. At index time, a document containing "wordOne" should expand to
> >   "wordOne | phrase one". A query for "wordOne" or "phrase one" should
> find
> >   the document, but a query for just "phrase" or "one" should not find
> the
> >   document.
> >
> >   2. Conversely, a document containing "phrase one" should expand to
> >   "phrase one | wordOne". A query for "wordOne" or "phrase one" should
> find
> >   the document. (Depending on field tokenization, I would also expect
> >   "phrase" and "one" to find the document.)
> >
> > To attempt to achieve this behaviour, I have downloaded Solr 4.1.0 and
> made
> > the following changes to
> > "solr-4.1.0\example\solr\collection1\conf\schema.xml":
> >
> > https://gist.github.com/sharpedavid/5072150
> >
> >
> > (Note that I set SynonymFilterFactor
> > tokenizerFactory="solr.KeywordTokenizerFactory". This is to prevent
> > "wordOne" from being expanded to "wordOne | phrase | one".)
> >
> > Achieving the first behaviour (i.e. number one in the above list) seems
> > difficult. A query for "wordOne" returns the document, but a query for
> > "phrase one" returns nothing. I realized that the query tokenizer
> tokenized
> > my query for "phrase one", so I changed the query tokenizer to
> > KeywordTokenizer, which achieves the desired behaviour, but now queries
> are
> > not tokenized at all, which breaks other desirable behaviour.
> >
> > The second behaviour (i.e. number two in the above list) has similar
> > problems, but no solution that I can see. If the index tokenizer is
> > StandardTokenizer, "phrase one" is tokenized to "phrase | one", so the
> > equivalent synonym is not matched. If I change the index tokenizer to
> > KeywordTokenizer, it does match; however, KeywordTokenizer will treat the
> > entire field as a a single token, so a document containing "something
> > phrase one something" will not match the equivalent synonym, and also a
> > query for "phrase" or "one" will not find the document.
> >
> > Thank you for your time.
> >
> > Sincerely,
> > David Sharpe
>
>


-- 
Felipe Lahti
Consultant Developer - ThoughtWorks Porto Alegre

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message