lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dorian Hoxha <dorian.ho...@gmail.com>
Subject Re: ttl on merge-time possible somehow ?
Date Fri, 16 Dec 2016 18:13:34 GMT
On Fri, Dec 16, 2016 at 4:42 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 12/16/2016 12:54 AM, Dorian Hoxha wrote:
> > I did some search for TTL on solr, and found only a way to do it with
> > a delete-query. But that ~sucks, because you have to do a lot of
> > inserts (and queries).
>
> You're going to have to be very specific about what you want Solr to do.
>
> > The other(kinda better) way to do it, is to set a collection-level
> > ttl, and when indexes are merged, they will drop the documents that
> > have expired in the new merged segment. On the client, I will make
> > sure to do date-range queries so I don't get back old documents. So:
> > 1. is there a way to easily modify the segment-merger (or better way?)
> > to do that ?
>
> Does the following describe the the feature you're after?
>
> https://lucidworks.com/blog/2014/05/07/document-expiration/
>
> If this is what you're after, this is *Solr* functionality.  Segment
> merging is *Lucene* functionality.  Lucene cannot remove documents
> during merge until they have been deleted.  It is Solr that handles
> deleting documents after they expire.  Lucene is not aware of the
> expiration concept.
>
Yep, that's what came in my search. See how TTL work in hbase/cassandra/
rocksdb <https://github.com/facebook/rocksdb/wiki/Time-to-Live>. There
isn't a "delete old docs"query, but old docs are deleted by the storage
when merging. Looks like this needs to be a lucene-module which can then be
configured by solr ?


> > 2. is there a way to support this also on get ? looks like I can use
> > realtimeget + filter query and it should work based on documentation
>
> Realtime get allows you to retrieve documents that have been indexed but
> not yet committed.  I doubt that deleted documents or document
> expiration affects RTG at all.  We would need to know exactly what you
> want to get working here before we can say whether or not you're right
> when you say "it should work."
>
Just like in hbase,cassandra,rocksdb, when you "select" a row/document that
has expired, it exists on the storage, but isn't returned by the db,
because it checks the timestamp and sees that it's expired. Looks like this
also need to be in lucene?

>
> Thanks,
> Shawn
>
> Makes more sense ?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message