lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-5795) Option to periodically delete docs based on an expiration field -- or ttl specified when indexed.
Date Wed, 26 Mar 2014 02:21:19 GMT

     [ https://issues.apache.org/jira/browse/SOLR-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-5795:
---------------------------

    Attachment: SOLR-5795.patch

Updated patch:

* javadocs
* refactor some redundent code
* add support for configuring a "ttlParamName" that can be used instead of (or as a default
to) the "ttlFieldName"
* add scafolding for the "only run on overser" logic (waiting for SOLR-5823)

There's still some TODOs but nothing that I think should be a blocker, just room for improvement
and/or additional configuration.

----

Unfortunately, when i tried testing this in combination with SOLR-5823 (so only the overseer
triggers the periodic deletes) the distrib test failed repeatedly -- it timed out waiting
for the doc to be deleted and it never was.  I spent a bit of time looking through the logs,
and i can't make sense of it:

* the overseer logic seemed to be working, periodic deletes were being logged from one node,
but other nodes just logged once that they weren't hte overseer and weren't going to manage
the deletes
* the deleteByQuery commands seemed to be getting forwarded -- i was seeing deleteByQuery
that had TOLEADER and FROMLEADER params getting logged.
* likewise the commit commands also seemed to be getting forwared

...and yet still, the query loop for the doc that should be expired continously got numFound=1


I'll dig in more tomorrow with fresh eyes.

in the meantime: feedback on teh patch -- particularly the javadocs even if folks don't want
to wade into the code -- would be appreciated.

> Option to periodically delete docs based on an expiration field -- or ttl specified when
indexed.
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5795
>                 URL: https://issues.apache.org/jira/browse/SOLR-5795
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>         Attachments: SOLR-5795.patch, SOLR-5795.patch, SOLR-5795.patch, SOLR-5795.patch
>
>
> A question I get periodically from people is how to automatically remove documents from
a collection at a certain time (or after a certain amount of time).  
> Excluding from search results using a filter query on a date field is trivial, but you
still have to periodically send a deleteByQuery to clean up those older "expired" documents.
 And in the case where you want all documents to auto-expire some fixed amount of time when
they were indexed, you still have to setup a simple UpdateProcessorto set that expiration
date.  So i've been thinking it would be nice if there was a simple way to configure solr
to do it all for you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message