lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Using Lucene partly as DB and 'joining' search results.
Date Fri, 11 Apr 2008 10:19:33 GMT
We're planning to archive email over many years and have been looking at using 
DB to store mail meta data and Lucene for the indexed mail data, or just Lucene 
on its own with email data and structure stored as XML and the raw message 
stored in the file system.

For some customers, the volumes are likely to be well over 1 billion mails over
10 years, so some  partitioning of data is needed.  At the moment the thoughts
are moving away from using a DB + Lucene to just Lucene along with a file system
representation of the complete message.  All searches will be against the index 
then the XML mail meta data is loaded from the file system.

The archive is read only apart from bulk deletes, but one of the requirements is 
for users to be able to label their own mail.  Given that a Lucene Document 
cannot be updated, I have thought about having a separate Lucene index that has 
just the 3 terms (or some combination of) userId + mailId + label.

That of course would mean joining searches from the main mail data index and the 
label index.

Does anyone have any experience of using Lucene this way and is it a realistic 
option of avoiding the DB at all?  I'd rather the headache of scaling just 
Lucene, which is a simple beast, than the whole bundle of 'stuff' that comes 
with the database as well.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message