lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Rangarajan <arunrangara...@gmail.com>
Subject Re: Solr SynonymFilter in Lucene analyzer
Date Thu, 19 Aug 2010 04:55:17 GMT
I think the lucene WhitespaceAnalyzer I am using inside Solr's SynonymFilter
is the one that prevents multi-word synonyms like "New York" from getting
mapped to the generic synonym name like CONCEPTYcity. It appears to me that
an analyzer which recognizes that a white-space is inside a synonym like
"New York" will be required. Do I need to implement one like this or is
there already an analyzer I can use? Looks like I am missing something here,
since Solr's SynonymFilter is supposed to handle this. Can someone tell me
what is the correct way to integrate Solr's SynonymFilter within a custom
lucene analyzer? Thanks.


On Tue, Aug 17, 2010 at 4:44 PM, Arun Rangarajan
<arunrangarajan@gmail.com>wrote:

> I am trying to have multi-word synonyms work in lucene using Solr's *
> SynonymFilter*.
>
> I need to match synonyms at index time, since many of the synonym lists are
> huge. Actually they are really not synonyms, but are words that belong to a
> concept. For example, I would like to map {"New York", "Los Angeles", "New
> Orleans", "Salt Lake City"...}, a bunch of city names, to the concept called
> "city". While searching, the user query for the concept "city" will be
> translated to a keyword like, say "CONCEPTcity", which is the synonym for
> any city name.
>
> Using lucene's SynonymAnalyzer, as explained in Lucene in Action (p. 131),
> all I could match for "CONCEPTcity" is single word city names like
> "Chicago", "Seattle", "Boston", etc., It would not match multi-word city
> names like "New York", "Los Angeles", etc.,
>
> I tried using Solr's SynonymFilter in tokenStream method in a custom
> Analyzer (that extends org.apache.lucene.analysis.
> Analyzer - lucene ver. 2.9.3) using:
>
> *    public TokenStream tokenStream(String fieldName, Reader reader) {
>         TokenStream result = new SynonymFilter(
>                 new WhitespaceTokenizer(reader),
>                 synonymMap);
>         return result;
>     }
> *
> where *synonymMap* is loaded with synonyms using
>
> *synonymMap.add(conceptTerms, listOfTokens, true, true);*
>
> where *conceptTerms* is of type *ArrayList<String>* of all the terms in a
> concept and *listofTokens* is of type *List<Token>  *and contains only the
> generic synonym identifier like *CONCEPTcity*.
>
> When I print synonymMap using synonymMap.toString(), I get the output like
>
> <{New York=<{Chicago=<{Seattle=<{New
> Orleans=....<[(CONCEPTcity,0,0,type=SYNONYM),ORIG],null>}>}>}>....}>
>
> so it looks like all the synonyms are loaded. But if I search for
> "CONCEPTcity" then it says no matches found. I am not sure whether I have
> loaded the synonyms correctly in the synonymMap.
>
> Any help will be deeply appreciated. Thanks!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message