lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: Continuous stream indexing and time-based segment management
Date Tue, 19 Jun 2012 19:44:34 GMT
On Tue, Jun 19, 2012 at 6:42 PM, mark harwood <> wrote:
> There are a number of scenarios where Lucene might be used to index a fixed time range
on a continuous stream of data e.g. a news feed.
> In these scenarios I imagine the following facilities would be useful:
> a) A MergePolicy that organized content into segments on the basis of increasing time
units e.g. 5min->10 min->1 hour->1 day
> b) The ability to drop entire segments e.g. the day-level segment from exactly a week

you can do that by subclassing IW and call some package private APIs /
members. We can certainly make that easier but I personally don't want
to open this as a public API. I can certainly imagine to have a
protected API that allows dropping entire segment.


> c) Various new analysis functions comparing term frequencies across time e.g discovery
of "trending" topics.
> I can see that a) could be implemented using a custom MergePolicy and c) can be done
via existing APIs but I'm not sure if there is way to simply drop entire segments currently?
> Anyone else had thoughts in this area?
> Cheers
> Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message