lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iam Jabour <>
Subject Re: Best way to create a Lucene Index with fields that can be updated frequently, and filtering the results by this field.
Date Mon, 01 Nov 2010 21:56:11 GMT
Thanks guys!

I can estimate frequently equal to update 20% of 800k weekly. If I
just optimize one time a week  the main cost will be in this
optimization, correctly?

Erick, I will use your approach, and if it don't work wheel I try
option 3, what do you think?

Nilesh,  your approach looks good, but looks complicated too. I will
try a simple approach first.
Iam Jabour

On Mon, Nov 1, 2010 at 6:54 PM, Erick Erickson <> wrote:
> How often is "frequently"? How many updates do you expect to do in
> a day? And how quickly must those updates be reflected in the search
> results?
> 800K documents isn't all that many. I'd go with the simple approach first
> and monitor the results, #then# go to a more complex solution if you
> see a problem arising. Just update (delete/add) the documents when
> the value changes.
> Well, the cost to reindex is just about what the cost to index it orignally
> is. The old version of the document is marked deleted and the new one
> is added. It's essentially the same cost as to index a new document.
> This leaves some gaps in your index, that is the deleted docs are still in
> there, but the next optimize will compact them.
> From which you may infer that optimizing is the expensive part. I'd do that,
> say
> once daily (or even weekly).
> Erick
> On Mon, Nov 1, 2010 at 3:25 PM, Iam Jabour <> wrote:
>> Hi, I use Lucene to index my documents and search. Actually I have
>> 800k documents indexed in Lucene. Those documents have some fields:
>> Id: is a Numeric field to index the documents
>> Name: is a textual field to be stored and analyzed
>> Description: like name
>> Availability: is a numeric field to filter results. This field can be
>> updated frequently, every day.
>> My question is: What's the better way to create a filter for availability?
>> 1 - add this information to index and make a lucene filter. With this
>> approach I have to update document (remove and add, because lucene
>> 3.0.2 not have update support) every time the "availability" changes.
>> What the cost of reindex?
>> 2 - don't add this information to index, and filter the results with a
>> DB select. This approach will do a lot of selects, because I need
>> select every id from database to check availability.
>> 3 - Create a separated index with id and availability. I don't know if
>> it is a good solution, but I can create a index with static
>> information and other with information can be frequently updated. I
>> think it is better then update all document, just because some fields
>> were updated.
>> ______________
>> Iam Jabour
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message