lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Re: indexing synonyms / reducing the index size
Date Wed, 04 May 2005 21:54:32 GMT
Pablo Gomes Ludermir wrote:

> Hello all,
> 
> I know that we can expand a word to get its synonyms with Wordnet. I
> was wondering if we could reduce the index size by including a synonym
> instead of a word on the synonym list.
> 
> For instance, if "screen" shows up, I would like to replace it by
> "monitor" (it is a stupid example, but it was the first thing that
> crossed my mind). Thus, instead of having both entries on the index, I
> would have only one.
> 
> Thus, I would need to pre-process any queries, replacing the words by
> its synonyms as well. I was wondering if someone has done such a thing
> in an analyzer already and could give me a little help.

Already done, I did it, bottom of this page, search for wordnet:
http://lucene.apache.org/java/docs/lucene-sandbox/

It runs in 2 phases:
[1] Parses some of Wordnet, stores synonyms in a Lucene index as a kind 
of persistent Map. This is just run once.

[2] Query expansion, does things like expanding "monitor" to "monitor 
screen". This runs against an unchanged index.


> 
> My aim is to reduce the index as much as possible (I already have a
> stemmer and a stopword filter on the analyzer). Could anyone point
> other ways to reduce the number of terms of an index?
> 
> The fact is that I would like to create "extra vectors" with my own
> weighting scheme, and it is a quite costly algorithm, so the less
> terms I have the better it performs.
> 
> Regards,
> Pablo
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message