lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: The best way forward
Date Tue, 04 Nov 2003 12:04:03 GMT

--- jt oob <> wrote:
> Thank you for the replies!
> My indexes are currently looking like they might be 12GB when
> finished
> on the current run.
> I have spotted a tool on the lucene site for listing the most
> frequently occuring words in the index. Currently I am using the
> defaultAnalyzer  stoplist, I should probably use a more comprehensive
> list.
> Is there a way of implementing a stoplist after the index has been
> created,  removing all occurances of the new stoplist words?
> I could then write a new Analyzer with the new stoplist for adding
> new documents to the index.
> Am i doomed to reindexing with a better stoplist?

I believe you'll need to re-index.
Well, if your old stop list is a subset of the new stop list, then you
may be able to get away without re-indexing.

> In view of the index size, I am going to see how well the kernel
> caching performs, as the index probably won't fit entirely into
> memory
> once the operating system and other system processes have taken their
> bite of the available memory.
> Eventually i am going to try to implement something similar to google
> groups, indexing lots of NNTP traffic. Has anyone done this before
> with lucune?

Not that I know, but people have used Lucene to index their email,
which is somewhat similar.


Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message