lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davis, Daniel (NIH/NLM) [C]" <>
Subject RE: Mutli term synonyms
Date Mon, 20 Apr 2015 15:03:38 GMT
Handling MESH descriptor preferred terms and such is similar.   I encountered this during evaluation
of Solr for a project here at NLM.   We decided to use Solr for different projects instead.
    I considered the following approaches:
 - use a custom tokenizer at index time that indexed all of the multiple term alternatives.
 - index the data, and then have an enrichment process that queries on each source synonym,
and generates an update to add the target synonyms.  
   Follow this with an optimize.
 - During the indexing process, but before sending the data to Solr, process the data to tokenize
and add synonyms to another field.

Both the custom tokenizer and enrichment process share the feature that they use Solr's own
tokenizer rather than duplicate it.   The enrichment process seems to me only workable in
environments where you can re-index all data periodically, so no continuous stream of data
to index that needs to be handled relatively quickly once it is generated.    The last method
of pre-processing the data seems the least desirable to me from a blue-sky perspective, but
is probably the easiest to implement and the most independent of Solr.

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

-----Original Message-----
From: Kaushik [] 
Sent: Monday, April 20, 2015 10:47 AM
Subject: Mutli term synonyms


Reading up on synonyms it looks like there is no real solution for multi term synonyms. Is
that right? I have a use case where I need to map one multi term phrase to another. i.e. Tween
20 needs to be translated to Polysorbate 40.

Any thoughts as to how this can be achieved?

View raw message