lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Schnitzer <j...@infohazard.org>
Subject Architecture for indexing/searching mailing list archives
Date Mon, 24 Jul 2006 08:35:29 GMT
Hi.  I'm the lead developer of SubEtha, a new java open source mailing 
list manager (http://subetha.tigris.org/).  I'm working on archive 
searching at the moment.  I've used Lucene with great success in a 
previous application, but some of the characteristics of this app have 
me seeking architectural advice:

* While most installations will have only a handful of lists, a 
sourceforge-sized installation might have thousands or even tens of 
thousands of (likely sparse) lists.
* Searching is always constrained to a specific list; you never search 
through the archives of more than one list at a time.

I have a thread that wakes up and periodically indexes all newly arrived 
mail.  Which would be the best approach?

1) Build a wholly separate index per mailing list.  For each search 
request, create a new IndexSearcher on the appropriate index and run the 
query.
2) Build a wholly separate index per mailing list.  Cache IndexSearchers 
that are created when search requests come in for each mailing list.  
Close and remove IndexSearchers from the cache when a list's index gets 
updated.
3) Build a single index that holds all messages, storing the associated 
list id as a field.  Use a Filter to limit each search to a specific 
list.  Use a single cached IndexSearcher that is closed and removed when 
the update process runs.

I'm guessing that #2 is the right answer, but I'm a little worried about 
what might happen in a server that indexes 10,000 lists.  In a 
long-running process, this could result in 10,000 cached 
IndexSearchers.  Too many open file handles?  Does IndexSearcher consume 
much memory?  It's fair to say that anyone that wishes to have this kind 
of capacity will have to do some tuning of the OS parameters, but I 
would like to understand the bounds of the problem a bit better.

Any advice?

Thanks,
Jeff Schnitzer
SubEtha Mailing List Manager - http://subetha.tigris.org/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message