lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chan kang" <>
Subject Merging partially sorted indices to form a new fully sorted single index?
Date Thu, 30 Mar 2006 16:27:24 GMT
Hi, I've been trying to show the query results in a reverse-chronological
and found out that the best way to do so is to pre-sort them if possible,
so that, when searching, the relevant documents are shown in the
reverse-chronological order(the most recent document at the top) even
without real-time sorting.
Although presorting the index in chronological order is easy (just
addDocument() for each new incoming document, and optimize), the reverse
seems to be
The way I'm handling it now is to

1. index without ordering.
2. sort the index reverse-chronologically
3. re-index and optimize.
4. when a new document comes in, do steps 1-3 again..

Steps 1-3 is not that different from sorting in chronological order, but
when it comes to step4, the process becomes very much redundant.
I mean, for example if I wanted to show every search results in a sorted
way, so that the most recent document comes to the top, I would have to go
through steps
1-3 every time when a new document is added (by crawling the web or

So, i thought, if the following was possible, it would be much easier...
1. create a new index for incoming documents
2. sort it reverse-chronologically -> index_new
3. use addIndexes() and do "index_new.addIndexes(old_index)"
4. optimize

That way, the new index is sorted, and the old index(which is much much
larger than incoming ones) is also sorted, and two sorted indexes can be
merged to make
a final fully sorted version, and this means not re-indexing the whole set
of documents in
the original index according to time.
However, I'm not sure whether the addIndexes() also preserves order.
Does it?

Also, is there a better way to do this? - partial sorting and appending the
two sorted indices to form a finally a single sorted index?

Thanks in advance.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message