lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: posting list traversal code
Date Thu, 13 Jun 2013 07:20:43 GMT
Hi,

On Thu, Jun 13, 2013 at 8:24 AM, Denis Bazhenov <dotsid@gmail.com> wrote:
> Document id on the index level is offset of the document in the index. It can change
over time for the same document, for example when merging several segments. They are also
stored in order in posting lists. This allows fast posting list intersection. Some Lucene
API's explicitly state that they operate on the document ids in order (like TermDocs), some
allows out of order processing (like Collector). So it really depends.
>
> In case of SortingAtomicReader, as far as I know, it calculate document permutation,
which allows to have sorted docIDs on the output. So, it basically relabel documents.

This is correct. The org.apache.lucene.index.sorter.Sorter.sort method
computes a permutation of the doc IDs which makes doc IDs sorted
according to the sort order. SortingAtomicReader is just a view over
an AtomicReader which uses this permutation to relabel doc IDs and
give the impression that the index is sorted. But this class is not
very interesting by itself can can be very slow to decode postings:
for each term it needs to load all postings into memory and sort them
before returning an enumeration of the doc IDs (see the
SortingDocsEnum class), it is only useful to sort indices offline with
IndexWriter.addIndexes or online with SortingMergePolicy.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message