lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <>
Subject Re: Lucene vs. Database
Date Wed, 01 Oct 2008 08:22:10 GMT

Pros of keeping content only in the database
* Need only one stored copy of data (saved disk space)

Pros of storing copy of content in Lucene:

* A match is more easily explained
If you collapse multiple DB fields into a single searchable field e.g. 
customer first name and surname database field into a single Lucene 
"name" field it is easier to get highlighting to work with the actual 
data that was searched ("name") than trying to piece together what this 
was made from in the DB and apply highlighting. Not impossible to 
overcome but just more work to bear in mind.

* A match result is guaranteed consistent -
If there is a time lag between database update and search indexing 
(which there invariably is) then you could match a search on a value 
that is no longer stored in the database.

* Speed of content retrieval
Lucene doc retrievals using internal Lucene doc ids and selective field 
loading may prove to be faster than hitting the DB with "select X from 
table where key in (matchKey1, matchKey2....)". Remember you have to 
read Lucene docs anyway to get the "matchKeyX" value for your SQL statement.
Only benchmarking will tell how much faster this is if at all. It does 
depend on your doc sizes/number of fields shown etc


agatone wrote:
> Hi, 
> I asked this question already on "lucene-general" list but also got advised
> to ask here too.
> I'm working on a project that has big database in the background (some
> tables have about 1500000 rows). We decided to use Lucene for "faster"
> search. Our search works similar as all searches: you write search string,
> get list of hits with detail link. But there is dilemma if we should store
> more data into index than it's needed. 
> One side of developing team insists that we should use lucene index as
> somekind of storage for data so when you get hit, you go onto details and
> then again use lucene to find document that matches the selected ID and take
> the data from Lucene index. So in the end you end with copying complete
> database tables into the lucene index.
> Other side insists on storing to index only data that is displayed directly
> to the user when showing the search results list and needed for search
> criteria. When you go onto details, you have the matching ID so you can
> pickup that row from database by that ID rather than search it inside Lucene
> index. 
> Can someone please describe drawbacks and advantages of both approaches.
> Actually can someone write down what's the actual profit, where and when of
> the Lucene itself in real production env. 
> IT would be great if there is anyone who could write his experience with
> indexing and searching large amount of data.
> Thank you
> ------------------------------------------------------------------------
> No virus found in this incoming message.
> Checked by AVG - 
> Version: 8.0.173 / Virus Database: 270.7.5/1700 - Release Date: 9/30/2008 11:03 AM

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message