lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jt oob <>
Subject Re: The best way forward
Date Tue, 04 Nov 2003 11:09:27 GMT
Thank you for the replies!

My indexes are currently looking like they might be 12GB when finished
on the current run.

I have spotted a tool on the lucene site for listing the most
frequently occuring words in the index. Currently I am using the
defaultAnalyzer  stoplist, I should probably use a more comprehensive

Is there a way of implementing a stoplist after the index has been
created,  removing all occurances of the new stoplist words?
I could then write a new Analyzer with the new stoplist for adding new
documents to the index.
Am i doomed to reindexing with a better stoplist?

In view of the index size, I am going to see how well the kernel
caching performs, as the index probably won't fit entirely into memory
once the operating system and other system processes have taken their
bite of the available memory.

Eventually i am going to try to implement something similar to google
groups, indexing lots of NNTP traffic. Has anyone done this before with

Thanks again,

Want to chat instantly with your online friends?  Get the FREE Yahoo!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message