lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Babak Farhang <farh...@gmail.com>
Subject synonym-group filter
Date Sat, 14 Nov 2009 21:35:42 GMT
SynonymTokenFilter, if I understand correctly, maps a given token to a
set of tokens representing its synonyms. If used in the filter chain
of a query analyzer, it causes a "query expansion". (Correct
terminology?) If used in the filter chain of an analyzer it causes
"index expansion".

I was wondering whether anyone has implemented a synonym filter that
instead of mapping tokens to their synonyms, maps tokens to their
"synonym-groups". Again, I'm not sure this is correct IR terminology,
but borrowing from the SynonymMap implementation, what I mean by a
"synonym-group" is a set words that are considered synonyms. If a word
can have different [contextual] meanings, then it would be a member of
multiple synonym-groups.

The idea here is to minimize the index/query "expansion" by observing
that the number of synonym-groups a word belongs to would typically be
far fewer than the number of its synonyms. Each synonym-group would be
represented by a specially unique term in the index. Unlike
SynonymTokenFilter, the filter would have to be used in both the
indexer and query analyzer.

This is not a new idea. See the comments in LUCENE-1622 (a tangential
topic), for example. Has anyone contributed an implementation?

-Babak

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message