lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: The best way forward
Date Tue, 04 Nov 2003 12:04:03 GMT

--- jt oob <jt2oob@yahoo.co.uk> wrote:
> Thank you for the replies!
> 
> My indexes are currently looking like they might be 12GB when
> finished
> on the current run.
> 
> I have spotted a tool on the lucene site for listing the most
> frequently occuring words in the index. Currently I am using the
> defaultAnalyzer  stoplist, I should probably use a more comprehensive
> list.
> 
> Is there a way of implementing a stoplist after the index has been
> created,  removing all occurances of the new stoplist words?
> I could then write a new Analyzer with the new stoplist for adding
> new documents to the index.
> Am i doomed to reindexing with a better stoplist?

I believe you'll need to re-index.
Well, if your old stop list is a subset of the new stop list, then you
may be able to get away without re-indexing.

> In view of the index size, I am going to see how well the kernel
> caching performs, as the index probably won't fit entirely into
> memory
> once the operating system and other system processes have taken their
> bite of the available memory.
> 
> Eventually i am going to try to implement something similar to google
> groups, indexing lots of NNTP traffic. Has anyone done this before
> with lucune?

Not that I know, but people have used Lucene to index their email,
which is somewhat similar.

Otis


__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message