lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Rangarajan <arunrangara...@gmail.com>
Subject Solr SynonymFilter in Lucene analyzer
Date Tue, 17 Aug 2010 23:44:36 GMT
I am trying to have multi-word synonyms work in lucene using Solr's *
SynonymFilter*.

I need to match synonyms at index time, since many of the synonym lists are
huge. Actually they are really not synonyms, but are words that belong to a
concept. For example, I would like to map {"New York", "Los Angeles", "New
Orleans", "Salt Lake City"...}, a bunch of city names, to the concept called
"city". While searching, the user query for the concept "city" will be
translated to a keyword like, say "CONCEPTcity", which is the synonym for
any city name.

Using lucene's SynonymAnalyzer, as explained in Lucene in Action (p. 131),
all I could match for "CONCEPTcity" is single word city names like
"Chicago", "Seattle", "Boston", etc., It would not match multi-word city
names like "New York", "Los Angeles", etc.,

I tried using Solr's SynonymFilter in tokenStream method in a custom
Analyzer (that extends org.apache.lucene.analysis.
Analyzer - lucene ver. 2.9.3) using:

*    public TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new SynonymFilter(
                new WhitespaceTokenizer(reader),
                synonymMap);
        return result;
    }
*
where *synonymMap* is loaded with synonyms using

*synonymMap.add(conceptTerms, listOfTokens, true, true);*

where *conceptTerms* is of type *ArrayList<String>* of all the terms in a
concept and *listofTokens* is of type *List<Token>  *and contains only the
generic synonym identifier like *CONCEPTcity*.

When I print synonymMap using synonymMap.toString(), I get the output like

<{New York=<{Chicago=<{Seattle=<{New
Orleans=....<[(CATEGORYcity,0,0,type=SYNONYM),ORIG],null>}>}>}>....}>

so it looks like all the synonyms are loaded. But if I search for
"CATEGORYcity" then it says no matches found. I am not sure whether I have
loaded the synonyms correctly in the synonymMap.

Any help will be deeply appreciated. Thanks!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message