lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oscar Picasso <>
Subject Using lucene to speed up queries on a relational db
Date Wed, 12 Apr 2006 13:07:46 GMT

I have a postgresql table which expects to have around 20 millions rows.

- The structure is the following;

code int              -- can take one of 100 values
property varchar(250) -- can take one of 5000 values
param01 char(10)      -- can take one of 10 values
param02 char(10)      -- can take one of 10 values
[ 20 similar columns }
parama20 char(10)     -- can take one of 10 values
kewords text          -- 0 to 15 keywords (any word from a human language like english can
be a keyword)

- The queries will involve 1 to all the columns of table with an AND operator.

I find it very difficult to optimize this kind of queries in the relational database because
there are too many possible field combinaisons to create useful indexes.

As the columns use a small set of values I tought that it would be more efficient to use Lucene
to perform this kind of query.

Initial tests with around 200000 documents/rows are good.

But here is my concern.

What would be the performance for queries over 20 million documents/rows using up to 20 fields
in the boolean query (with Occur.MUST)?.

Any idea?



Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message