lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iam Jabour <iamjab...@gmail.com>
Subject Re: Best way to create a Lucene Index with fields that can be updated frequently, and filtering the results by this field.
Date Mon, 01 Nov 2010 21:56:11 GMT
Thanks guys!

I can estimate frequently equal to update 20% of 800k weekly. If I
just optimize one time a week  the main cost will be in this
optimization, correctly?

Erick, I will use your approach, and if it don't work wheel I try
option 3, what do you think?

Nilesh,  your approach looks good, but looks complicated too. I will
try a simple approach first.
______________
Iam Jabour




On Mon, Nov 1, 2010 at 6:54 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> How often is "frequently"? How many updates do you expect to do in
> a day? And how quickly must those updates be reflected in the search
> results?
>
> 800K documents isn't all that many. I'd go with the simple approach first
> and monitor the results, #then# go to a more complex solution if you
> see a problem arising. Just update (delete/add) the documents when
> the value changes.
>
> Well, the cost to reindex is just about what the cost to index it orignally
> is. The old version of the document is marked deleted and the new one
> is added. It's essentially the same cost as to index a new document.
> This leaves some gaps in your index, that is the deleted docs are still in
> there, but the next optimize will compact them.
>
> From which you may infer that optimizing is the expensive part. I'd do that,
> say
> once daily (or even weekly).
>
> HTH
> Erick
>
> On Mon, Nov 1, 2010 at 3:25 PM, Iam Jabour <iamjabour@gmail.com> wrote:
>
>> Hi, I use Lucene to index my documents and search. Actually I have
>> 800k documents indexed in Lucene. Those documents have some fields:
>>
>> Id: is a Numeric field to index the documents
>>
>> Name: is a textual field to be stored and analyzed
>>
>> Description: like name
>>
>> Availability: is a numeric field to filter results. This field can be
>> updated frequently, every day.
>>
>> My question is: What's the better way to create a filter for availability?
>>
>> 1 - add this information to index and make a lucene filter. With this
>> approach I have to update document (remove and add, because lucene
>> 3.0.2 not have update support) every time the "availability" changes.
>> What the cost of reindex?
>>
>> 2 - don't add this information to index, and filter the results with a
>> DB select. This approach will do a lot of selects, because I need
>> select every id from database to check availability.
>>
>> 3 - Create a separated index with id and availability. I don't know if
>> it is a good solution, but I can create a index with static
>> information and other with information can be frequently updated. I
>> think it is better then update all document, just because some fields
>> were updated.
>>
>> ______________
>> Iam Jabour
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message