lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Best way to create a Lucene Index with fields that can be updated frequently, and filtering the results by this field.
Date Mon, 01 Nov 2010 20:54:10 GMT
How often is "frequently"? How many updates do you expect to do in
a day? And how quickly must those updates be reflected in the search
results?

800K documents isn't all that many. I'd go with the simple approach first
and monitor the results, #then# go to a more complex solution if you
see a problem arising. Just update (delete/add) the documents when
the value changes.

Well, the cost to reindex is just about what the cost to index it orignally
is. The old version of the document is marked deleted and the new one
is added. It's essentially the same cost as to index a new document.
This leaves some gaps in your index, that is the deleted docs are still in
there, but the next optimize will compact them.

>From which you may infer that optimizing is the expensive part. I'd do that,
say
once daily (or even weekly).

HTH
Erick

On Mon, Nov 1, 2010 at 3:25 PM, Iam Jabour <iamjabour@gmail.com> wrote:

> Hi, I use Lucene to index my documents and search. Actually I have
> 800k documents indexed in Lucene. Those documents have some fields:
>
> Id: is a Numeric field to index the documents
>
> Name: is a textual field to be stored and analyzed
>
> Description: like name
>
> Availability: is a numeric field to filter results. This field can be
> updated frequently, every day.
>
> My question is: What's the better way to create a filter for availability?
>
> 1 - add this information to index and make a lucene filter. With this
> approach I have to update document (remove and add, because lucene
> 3.0.2 not have update support) every time the "availability" changes.
> What the cost of reindex?
>
> 2 - don't add this information to index, and filter the results with a
> DB select. This approach will do a lot of selects, because I need
> select every id from database to check availability.
>
> 3 - Create a separated index with id and availability. I don't know if
> it is a good solution, but I can create a index with static
> information and other with information can be frequently updated. I
> think it is better then update all document, just because some fields
> were updated.
>
> ______________
> Iam Jabour
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message