lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantyn Smirnov <inject...@yahoo.com>
Subject Best Practice for Lucene Search
Date Wed, 11 Feb 2009 14:05:06 GMT

In the beginning of the development, I was also facing a choice to mirror the
documents in DB/index.

But when the number of raws reached the mark of 7 mio, the query like 

        "select count(id) from documentz" 

(using PostgresQL) would take ages (ok, about 10 minutes!!! ), it became
clear to me, that something is not right with that approach :confused:.

The other reasons to name a few, would be the need to run the data import
(almost) twice for the index and DB, and then synchronize them in case of
changes.

At the moment, I have a set-up of 6 physical indieces, 15 GB each and in
total I have like 46 mio documents, and can say that I'm pretty happy with
the search performance.

Ah, almost forgot! Those 46 mio documents represent around 100 different
sources (field-structures), and would need to be persisted in 100 different
DB-tables. Also a whole lot of new sources are expected to come, and be
added into the stack ONLINE w/o the server restart! 

Using mixed DB/index solution it would be nightmare to maintain that, but a
single Lucene index copes with the task fast and easy.

So, my vote for text-only datas goes clearly to Lucene-only solution :)
-- 
View this message in context: http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21955474.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message