lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Quail <m...@ctx.com.au>
Subject Re: Mixing database and lucene searches
Date Tue, 11 May 2004 03:45:05 GMT
Glen Stampoultzis wrote:
> Anyone have any strategies for dealing with this?  I'm wondering whether
> it's better to replicate searchable fields in the lucene index.  This means
> being very careful that updates get done in two places so it is not ideal.

If you *can* manage to update your index when the "real" database is
updaed, then indexing the database fields into the lucene index is the
way to go: because it makes searching easier (no lucene-db joins) and
faster (way faster!).

You could always "batch" the updates of the index. That is, if you can
put a "last modified" date on each row of the database, then updating
the lucene index is easy (the index just needs to remember the last date
at which it was updated).

If you *really* don't want to (or can't) put all the searchable fields
into lucene, then you are going to need to do a "lucene-db" join. This
is how I do it:

Just say your DB contains rows with a primary key "pk" (say, an int;
this is a Keyword in the lucene index), a field "author" (not indexed)
and a text blob "text" (this is what you have indexed in Lucene).

There are two basic joins you need to do, an "and" join and an "or" join.

"And" join: if you want to do a query like "select rows where
name='matt' and text contains 'foo'", then the psuedo code is:

---
Hits hits = searcher.search(new TermQuery("text", "foo")
Set hitPKs = new Set();
for each doc in hits:
   hitPKs.put(doc.getField("pk"))

rsPKs = new Set();
ResultSet rs = sql.execute("select PK from TABLE where name='matt'")
for each row in rs:
   rsPKs.put(rs.get("pk"));

Set results = intersection(rsPKs, hitPKs);
---

Similarly, the "or" query uses union() not intersection(). If you want
to preserve the order of the Hits, then use a List or LinkedSet instead.

The code for a lucene-db is very annoying: try to put all your
searchable fields into an Index instead :D

=Matt





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message