lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: deleting with sorting and max document
Date Wed, 14 Sep 2011 20:03:28 GMT

: can you provide your query which yields all the documents that you
: want to delete? I don't understand how the sort order changes anything
: here. if you want to only delete the top N docs of that query you
: should maybe modify your query to only return those. I could imagine
: you are returning the oldest first, if so can't you do a range filter
: on top instead of sorting?

i suspect the susinct problem description is something like "i want to 
only have the X newest docs that match query Q in my index, so i want to 
execute Q, find the total number of matches N, and then delete the first 
N-X docs matching Q when sorted by field F"

Hypothetical example: a news aggregation site, with various contracts 
with other news sites that say things like "only allowed to redisplay at 
most 1000 articles from the NY Times at any one time" and the people 
running the site want to always include the 1000 newest NYT articles and 
delete the older ones.

I suspect the most efficient way to deal with this would be to give every 
document a unique id that is garunteed to always increase.  then decide 
how many docs you need to delete, and execute a query sorting on that id 
field asc using that num docs as the size of a TopSortedDocs, and find the 
id of the "newest" doc that you want to delete, then reformulate the query 
to include a range query on the id field with that value.  if the num of 
docs to delete is too big to deal with TopSortedDocs, then paginate trough 
until you get the number you need.

(you can do the same thing w/o the unique id using a date field, but you 
run the risk of overdeleting if multiple docs have the same date)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message