lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil brunet" <>
Subject Re: Mixing database and lucene searches
Date Tue, 11 May 2004 07:55:02 GMT
Hi to all.

Just one word about "crossing" Lucene results and DB results.  Maybe it is 
obvious but maybe not (sorry for my english).

Once you got the Lucene results, a good way is to get the PKs that you have 
stored in the Lucene and use then directly in the SQL query with a IN 

In that way, if your Lucene query "retrieves" 100 results, your SQL query 
would never retrieve more than 100 results (it avoids you a costly SQL fetch 
and an intersection).

Using a fixed number of bind variables in the SQL query  guaranties you the 
best SQL query execution query (it cuts the db parsinf time).

If you can't guaranty a fixed number of Lucene results (and it is often the 
case !), a good way is to duplicate the last PK and so to  round to a fixed 



>From: "Glen Stampoultzis" <>
>Reply-To: "Lucene Users List" <>
>Subject: Re: Mixing database and lucene searches
>Date: Tue, 11 May 2004 14:39:23 +1000
>Thanks for the help.
>I think I'll got with putting all the fields into lucene.
>Just one comment about your strategy for combining db and lucene searches.
>It seems that it would slow down significantly the larger the results,
>although I can't see a better way to go about it.  For example if the 
>search matched 100 records and the database searched 1,000,000 records
>you're faced with iterating through a set of 1,000,000 records for a result
>that couldn't possibly be more than 100 actual records.  I guess there's 
>really much you can do about that.
>"Matt Quail" <> wrote in message
> > Glen Stampoultzis wrote:
> > > Anyone have any strategies for dealing with this?  I'm wondering 
> > > it's better to replicate searchable fields in the lucene index.  This
> > > being very careful that updates get done in two places so it is not
> >
> > If you *can* manage to update your index when the "real" database is
> > updaed, then indexing the database fields into the lucene index is the
> > way to go: because it makes searching easier (no lucene-db joins) and
> > faster (way faster!).
> >
> > You could always "batch" the updates of the index. That is, if you can
> > put a "last modified" date on each row of the database, then updating
> > the lucene index is easy (the index just needs to remember the last date
> > at which it was updated).
> >
> > If you *really* don't want to (or can't) put all the searchable fields
> > into lucene, then you are going to need to do a "lucene-db" join. This
> > is how I do it:
> >
> > Just say your DB contains rows with a primary key "pk" (say, an int;
> > this is a Keyword in the lucene index), a field "author" (not indexed)
> > and a text blob "text" (this is what you have indexed in Lucene).
> >
> > There are two basic joins you need to do, an "and" join and an "or" 
> >
> > "And" join: if you want to do a query like "select rows where
> > name='matt' and text contains 'foo'", then the psuedo code is:
> >
> > ---
> > Hits hits = TermQuery("text", "foo")
> > Set hitPKs = new Set();
> > for each doc in hits:
> >    hitPKs.put(doc.getField("pk"))
> >
> > rsPKs = new Set();
> > ResultSet rs = sql.execute("select PK from TABLE where name='matt'")
> > for each row in rs:
> >    rsPKs.put(rs.get("pk"));
> >
> > Set results = intersection(rsPKs, hitPKs);
> > ---
> >
> > Similarly, the "or" query uses union() not intersection(). If you want
> > to preserve the order of the Hits, then use a List or LinkedSet instead.
> >
> > The code for a lucene-db is very annoying: try to put all your
> > searchable fields into an Index instead :D
> >
> > =Matt
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Hotmail : un compte GRATUIT qui vous suit partout et tout le temps !

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message