lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Chyla <roman.ch...@gmail.com>
Subject Re: Mutli term synonyms
Date Wed, 29 Apr 2015 06:10:56 GMT
I'm not sure I understand - the autophrasing filter will allow the
parser to see all the tokens, so that they can be parsed (and
multi-token synonyms) identified. So if you are using the same
analyzer at query and index time, they should be able to see the same
stuff.

are you using multi-token synonyms, or just entries that look like
multi synonym? (in the first case, the tokens are separated by null
byte) - in the second case, they are just strings even with
whitespaces, your synonym file must contain exactly the same entries
as your analyzer sees them (and in the same order; or you have to use
the same analyzer to load the synonym files)

can you post the relevant part of your schema.xml?


note: I can confirm that multi-token synonym expansion can be made to
work, even in complex cases - we do it - but likely, if you need
multi-token synonyms, you will also need a smarter query parser.
sometimes your users will use query strings that contain overlapping
synonym entries, to handle that, you will have to know how to generate
all possible 'reads', example

synonym:

foo bar, foobar
hey foo, heyfoo

user input:

hey foo bar

possible readings:

((hey foo) +bar) OR (hey +(foo bar))

i'm simplifying it here, the fun starts when you are seeing a phrase query :)

On Tue, Apr 28, 2015 at 10:31 AM, Kaushik <kaushikadya@gmail.com> wrote:
> Hi there,
>
> I tried the solution provided in
> https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> .The mentioned solution works when the indexed data does not have alpha
> numerics or special characters. But in  my case the synonyms are something
> like the below.
>
>
>  T-MAZ 20  POLYOXYETHYLENE (20) SORBITAN MONOLAURATE  SORBITAN
> MONODODECANOATE  POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE  POLYOXYETHYLENE
> SORBITAN MONOLAURATE  POLYSORBATE 20 [MART.]  SORBIMACROGOL LAURATE
> 300  POLYSORBATE
> 20 [FHFI]  FEMA NO. 2915
>
> They have alpha numerics, special characters, spaces, etc. Is there a way
> to implment synonyms even in such case?
>
> Thanks,
> Kaushik
>
> On Mon, Apr 20, 2015 at 11:03 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.davis@nih.gov> wrote:
>
>> Handling MESH descriptor preferred terms and such is similar.   I
>> encountered this during evaluation of Solr for a project here at NLM.   We
>> decided to use Solr for different projects instead.     I considered the
>> following approaches:
>>  - use a custom tokenizer at index time that indexed all of the multiple
>> term alternatives.
>>  - index the data, and then have an enrichment process that queries on
>> each source synonym, and generates an update to add the target synonyms.
>>    Follow this with an optimize.
>>  - During the indexing process, but before sending the data to Solr,
>> process the data to tokenize and add synonyms to another field.
>>
>> Both the custom tokenizer and enrichment process share the feature that
>> they use Solr's own tokenizer rather than duplicate it.   The enrichment
>> process seems to me only workable in environments where you can re-index
>> all data periodically, so no continuous stream of data to index that needs
>> to be handled relatively quickly once it is generated.    The last method
>> of pre-processing the data seems the least desirable to me from a blue-sky
>> perspective, but is probably the easiest to implement and the most
>> independent of Solr.
>>
>> Hope this helps,
>>
>> Dan Davis, Systems/Applications Architect (Contractor),
>> Office of Computer and Communications Systems,
>> National Library of Medicine, NIH
>>
>> -----Original Message-----
>> From: Kaushik [mailto:kaushikadya@gmail.com]
>> Sent: Monday, April 20, 2015 10:47 AM
>> To: solr-user@lucene.apache.org
>> Subject: Mutli term synonyms
>>
>> Hello,
>>
>> Reading up on synonyms it looks like there is no real solution for multi
>> term synonyms. Is that right? I have a use case where I need to map one
>> multi term phrase to another. i.e. Tween 20 needs to be translated to
>> Polysorbate 40.
>>
>> Any thoughts as to how this can be achieved?
>>
>> Thanks,
>> Kaushik
>>

Mime
View raw message