lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Continuous stream indexing and time-based segment management
Date Tue, 19 Jun 2012 20:11:33 GMT
If you are willing/able to close the IndexWriter, it's easy to drop
segments by reading the SegmentInfos, editing, and writing back.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 19, 2012 at 3:44 PM, Simon Willnauer
<simon.willnauer@googlemail.com> wrote:
> On Tue, Jun 19, 2012 at 6:42 PM, mark harwood <markharw00d@yahoo.co.uk> wrote:
>> There are a number of scenarios where Lucene might be used to index a fixed time
range on a continuous stream of data e.g. a news feed.
>>
>> In these scenarios I imagine the following facilities would be useful:
>>
>> a) A MergePolicy that organized content into segments on the basis of increasing
time units e.g. 5min->10 min->1 hour->1 day
>> b) The ability to drop entire segments e.g. the day-level segment from exactly a
week ago
>
> you can do that by subclassing IW and call some package private APIs /
> members. We can certainly make that easier but I personally don't want
> to open this as a public API. I can certainly imagine to have a
> protected API that allows dropping entire segment.
>
> simon
>
>> c) Various new analysis functions comparing term frequencies across time e.g discovery
of "trending" topics.
>>
>> I can see that a) could be implemented using a custom MergePolicy and c) can be done
via existing APIs but I'm not sure if there is way to simply drop entire segments currently?
>>
>> Anyone else had thoughts in this area?
>>
>> Cheers
>> Mark
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message