lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Glen Stampoultzis" <>
Subject Re: Mixing database and lucene searches
Date Tue, 11 May 2004 04:39:23 GMT
Thanks for the help.

I think I'll got with putting all the fields into lucene.

Just one comment about your strategy for combining db and lucene searches.
It seems that it would slow down significantly the larger the results,
although I can't see a better way to go about it.  For example if the lucene
search matched 100 records and the database searched 1,000,000 records
you're faced with iterating through a set of 1,000,000 records for a result
that couldn't possibly be more than 100 actual records.  I guess there's not
really much you can do about that.



"Matt Quail" <> wrote in message
> Glen Stampoultzis wrote:
> > Anyone have any strategies for dealing with this?  I'm wondering whether
> > it's better to replicate searchable fields in the lucene index.  This
> > being very careful that updates get done in two places so it is not
> If you *can* manage to update your index when the "real" database is
> updaed, then indexing the database fields into the lucene index is the
> way to go: because it makes searching easier (no lucene-db joins) and
> faster (way faster!).
> You could always "batch" the updates of the index. That is, if you can
> put a "last modified" date on each row of the database, then updating
> the lucene index is easy (the index just needs to remember the last date
> at which it was updated).
> If you *really* don't want to (or can't) put all the searchable fields
> into lucene, then you are going to need to do a "lucene-db" join. This
> is how I do it:
> Just say your DB contains rows with a primary key "pk" (say, an int;
> this is a Keyword in the lucene index), a field "author" (not indexed)
> and a text blob "text" (this is what you have indexed in Lucene).
> There are two basic joins you need to do, an "and" join and an "or" join.
> "And" join: if you want to do a query like "select rows where
> name='matt' and text contains 'foo'", then the psuedo code is:
> ---
> Hits hits = TermQuery("text", "foo")
> Set hitPKs = new Set();
> for each doc in hits:
>    hitPKs.put(doc.getField("pk"))
> rsPKs = new Set();
> ResultSet rs = sql.execute("select PK from TABLE where name='matt'")
> for each row in rs:
>    rsPKs.put(rs.get("pk"));
> Set results = intersection(rsPKs, hitPKs);
> ---
> Similarly, the "or" query uses union() not intersection(). If you want
> to preserve the order of the Hits, then use a List or LinkedSet instead.
> The code for a lucene-db is very annoying: try to put all your
> searchable fields into an Index instead :D
> =Matt

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message