lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Using Lucene partly as DB and 'joining' search results.
Date Fri, 11 Apr 2008 16:05:57 GMT
Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme:
> Antony Bowesman a écrit :
> > We're planning to archive email over many years and have been
> > looking at using DB to store mail meta data and Lucene for the
> > indexed mail data, or just Lucene on its own with email data and
> > structure stored as XML and the raw message stored in the file
> > system.
> >
> > For some customers, the volumes are likely to be well over 1
> > billion mails over
> > 10 years, so some  partitioning of data is needed.  At the moment
> > the thoughts
> > are moving away from using a DB + Lucene to just Lucene along with
> > a file system
> > representation of the complete message.  All searches will be
> > against the index then the XML mail meta data is loaded from the
> > file system.
> >
> > The archive is read only apart from bulk deletes, but one of the
> > requirements is for users to be able to label their own mail. 
> > Given that a Lucene Document cannot be updated, I have thought
> > about having a separate Lucene index that has just the 3 terms (or
> > some combination of) userId + mailId + label.
> >
> > That of course would mean joining searches from the main mail data
> > index and the label index.
> >
> > Does anyone have any experience of using Lucene this way and is it
> > a realistic option of avoiding the DB at all?  I'd rather the
> > headache of scaling just Lucene, which is a simple beast, than the
> > whole bundle of 'stuff' that comes with the database as well.
>
> Use Filter and BitSet.
>  From the personnal data, you build a Filter
> (http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Fil
>ter.html) wich is used in the main index.

With 1 billion mails, and possibly a Filter per user, you may want to
use more compact filters than BitSets, which is currently possible
in the development trunk of lucene.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message