Hi,
our application is indexing our logging events as documents. when the
index reaches a limit, I want to delete the oldest 1 million events. since
the number of events per day changes on a day to day basis, I cannot just
delete blindly the last 3 days for instance.
based on your different inputs I decided to query with a max = 1 million
sorted by index order. I get the last document, get its timestamp, then
delete based on a new query that includes a criteria on the timestamp
field. this is good enough.
thanks all for you help,
Vincent
Chris Hostetter <hossman_lucene@fucit.org>
14.09.2011 22:04
Please respond to
java-user@lucene.apache.org
To
java-user@lucene.apache.org, simon.willnauer@gmail.com
cc
Subject
Re: deleting with sorting and max document
: can you provide your query which yields all the documents that you
: want to delete? I don't understand how the sort order changes anything
: here. if you want to only delete the top N docs of that query you
: should maybe modify your query to only return those. I could imagine
: you are returning the oldest first, if so can't you do a range filter
: on top instead of sorting?
i suspect the susinct problem description is something like "i want to
only have the X newest docs that match query Q in my index, so i want to
execute Q, find the total number of matches N, and then delete the first
N-X docs matching Q when sorted by field F"
Hypothetical example: a news aggregation site, with various contracts
with other news sites that say things like "only allowed to redisplay at
most 1000 articles from the NY Times at any one time" and the people
running the site want to always include the 1000 newest NYT articles and
delete the older ones.
I suspect the most efficient way to deal with this would be to give every
document a unique id that is garunteed to always increase. then decide
how many docs you need to delete, and execute a query sorting on that id
field asc using that num docs as the size of a TopSortedDocs, and find the
id of the "newest" doc that you want to delete, then reformulate the query
to include a range query on the id field with that value. if the num of
docs to delete is too big to deal with TopSortedDocs, then paginate trough
until you get the number you need.
(you can do the same thing w/o the unique id using a date field, but you
run the risk of overdeleting if multiple docs have the same date)
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
************************ DISCLAIMER ************************
This message is intended only for use by the person to
whom it is addressed. It may contain information that is
privileged and confidential. Its content does not
constitute a formal commitment by Lombard Odier
Darier Hentsch & Cie or any of its branches or affiliates.
If you are not the intended recipient of this message,
kindly notify the sender immediately and destroy this
message. Thank You.
*****************************************************************
|