lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautovic <emir.arnauto...@sematext.com>
Subject Re: to handle expired documents: collection alias or delete by id query
Date Thu, 23 Mar 2017 12:28:39 GMT
Hi Derek,

There are both pros and cons for both approaches:

1. if you are doing full reindexing PRO is that you have clean index all 
the time and even if something goes wrong, you don't have to switch 
alias to updated index so your users will not notice issues. CON is that 
you are doing full reindex all the time even amount of changes is 
minimal. Also, this approach is not real time friendly if you plan to 
have more frequent update cycles.

2. If you delete in existing index, you do min changes. But note that 
deleted doc are just flagged in index as deleted and removed when 
segments are merged. This can result in skewed statistics and if you 
have replicas and sort by score, can result in different ordering 
depending on replicas' merge cycles. Using optimize after update is done 
would solve this issue.

In order to make the right decision, you have to look at size of your 
collection, number of deleted items etc. You can even combine 
approaches, e.g. delete daily and do full reindex once a week.

HTH,
Emir


On 23.03.2017 07:10, Derek Poh wrote:
> Hi
>
> I have collections of products. I am doing indexing 3-4 times daily.
> Every day there are products that expired and I need to remove them 
> from these collectionsdaily.
>
> Ican think of 2 ways to do this.
> 1. using collection aliasto switch between a main and temp collection.
> - clear and index the temp collection
> - create alias to temp collection.
> - clear and index the main collection.
> - create alias to main collection.
>
> this way require additional collections.
>
> 2. get list of expired products and generate deleteby id queries to 
> the collections.
>
> Would like to get some advice on which way should I adopt?
>
>
> Derek
>
> ----------------------
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential 
> and/or privileged information. If you are not the intended recipient 
> or have received this e-mail in error, please inform the sender 
> immediately and delete this e-mail (including any attachments) from 
> your computer, and you must not use, disclose to anyone else or copy 
> this e-mail (including any attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Mime
View raw message