Hi to all. Just one word about "crossing" Lucene results and DB results. Maybe it is obvious but maybe not (sorry for my english). Once you got the Lucene results, a good way is to get the PKs that you have stored in the Lucene and use then directly in the SQL query with a IN operator. In that way, if your Lucene query "retrieves" 100 results, your SQL query would never retrieve more than 100 results (it avoids you a costly SQL fetch and an intersection). Using a fixed number of bind variables in the SQL query guaranties you the best SQL query execution query (it cuts the db parsinf time). If you can't guaranty a fixed number of Lucene results (and it is often the case !), a good way is to duplicate the last PK and so to round to a fixed number. Cordialy Phil >From: "Glen Stampoultzis" >Reply-To: "Lucene Users List" >To: lucene-user@jakarta.apache.org >Subject: Re: Mixing database and lucene searches >Date: Tue, 11 May 2004 14:39:23 +1000 > >Thanks for the help. > >I think I'll got with putting all the fields into lucene. > >Just one comment about your strategy for combining db and lucene searches. >It seems that it would slow down significantly the larger the results, >although I can't see a better way to go about it. For example if the >lucene >search matched 100 records and the database searched 1,000,000 records >you're faced with iterating through a set of 1,000,000 records for a result >that couldn't possibly be more than 100 actual records. I guess there's >not >really much you can do about that. > >Regards, > >Glen > > >"Matt Quail" wrote in message >news:40A04C41.1070400@ctx.com.au... > > Glen Stampoultzis wrote: > > > Anyone have any strategies for dealing with this? I'm wondering >whether > > > it's better to replicate searchable fields in the lucene index. This >means > > > being very careful that updates get done in two places so it is not >ideal. > > > > If you *can* manage to update your index when the "real" database is > > updaed, then indexing the database fields into the lucene index is the > > way to go: because it makes searching easier (no lucene-db joins) and > > faster (way faster!). > > > > You could always "batch" the updates of the index. That is, if you can > > put a "last modified" date on each row of the database, then updating > > the lucene index is easy (the index just needs to remember the last date > > at which it was updated). > > > > If you *really* don't want to (or can't) put all the searchable fields > > into lucene, then you are going to need to do a "lucene-db" join. This > > is how I do it: > > > > Just say your DB contains rows with a primary key "pk" (say, an int; > > this is a Keyword in the lucene index), a field "author" (not indexed) > > and a text blob "text" (this is what you have indexed in Lucene). > > > > There are two basic joins you need to do, an "and" join and an "or" >join. > > > > "And" join: if you want to do a query like "select rows where > > name='matt' and text contains 'foo'", then the psuedo code is: > > > > --- > > Hits hits = searcher.search(new TermQuery("text", "foo") > > Set hitPKs = new Set(); > > for each doc in hits: > > hitPKs.put(doc.getField("pk")) > > > > rsPKs = new Set(); > > ResultSet rs = sql.execute("select PK from TABLE where name='matt'") > > for each row in rs: > > rsPKs.put(rs.get("pk")); > > > > Set results = intersection(rsPKs, hitPKs); > > --- > > > > Similarly, the "or" query uses union() not intersection(). If you want > > to preserve the order of the Hits, then use a List or LinkedSet instead. > > > > The code for a lucene-db is very annoying: try to put all your > > searchable fields into an Index instead :D > > > > =Matt > > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > _________________________________________________________________ Hotmail : un compte GRATUIT qui vous suit partout et tout le temps ! http://g.msn.fr/FR1000/9493 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org