lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pablo Gomes Ludermir <gom...@gmail.com>
Subject indexing synonyms / reducing the index size
Date Wed, 04 May 2005 21:38:18 GMT
Hello all,

I know that we can expand a word to get its synonyms with Wordnet. I
was wondering if we could reduce the index size by including a synonym
instead of a word on the synonym list.

For instance, if "screen" shows up, I would like to replace it by
"monitor" (it is a stupid example, but it was the first thing that
crossed my mind). Thus, instead of having both entries on the index, I
would have only one.

Thus, I would need to pre-process any queries, replacing the words by
its synonyms as well. I was wondering if someone has done such a thing
in an analyzer already and could give me a little help.

My aim is to reduce the index as much as possible (I already have a
stemmer and a stopword filter on the analyzer). Could anyone point
other ways to reduce the number of terms of an index?

The fact is that I would like to create "extra vectors" with my own
weighting scheme, and it is a quite costly algorithm, so the less
terms I have the better it performs.

Regards,
Pablo

-- 
Pablo Gomes Ludermir
gomesp@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message