lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Bazhenov <dot...@gmail.com>
Subject Re: posting list traversal code
Date Thu, 13 Jun 2013 06:24:38 GMT
Document id on the index level is offset of the document in the index. It can change over time
for the same document, for example when merging several segments. They are also stored in
order in posting lists. This allows fast posting list intersection. Some Lucene API's explicitly
state that they operate on the document ids in order (like TermDocs), some allows out of order
processing (like Collector). So it really depends.

In case of SortingAtomicReader, as far as I know, it calculate document permutation, which
allows to have sorted docIDs on the output. So, it basically relabel documents.

On Jun 13, 2013, at 4:38 PM, Sriram Sankar <sankar@gmail.com> wrote:

> Thanks Denis.  I've been looking at the code in more detail now.  I'm
> interested in how the new SortingAtomicReader works.  Suppose I build an
> index and sort the documents using my own sorting function - as shown in
> the docs:
> 
> AtomicReader sortingReader = new SortingAtomicReader(reader, sorter);
> 
> writer.addIndexes(sortingReader);
> 
> When the docs are sorted using my function, I assume the docids are not
> going to be in order any more?  Unless the docids change to maintain the
> sorted order.
> 
> If you look at the code in (for example) ConjunctionScorer.doNext(doc),
> what is the "doc" that gets used here?  If it is the docid (and they are
> out of order), this method will not work.  So either the docids have to be
> in order, or the "doc" here is some other number that defines the position
> of the document in the posting list.
> 
> I'm trying to read the code to understand this - I'd really appreciate
> someone with more indepth knowledge of this explaining this and also
> pointing me to somewhere in the code where the magic happens.
> 
> Thanks,
> 
> Sriram.
> 
> 
> 
> 
> On Wed, Jun 12, 2013 at 9:33 PM, Denis Bazhenov <dotsid@gmail.com> wrote:
> 
>> I'm not quite sure, what you really need. But as far as I understand, you
>> want to get all document id's for a given term. If so, the following code
>> will work for you:
>> 
>> Term term = new Term("fieldName", "fieldValue");
>> TermDocs termDocs = indexReader.termDocs(term);
>> while (termDocs.next()) {
>>        int docId = termDocs.doc();
>>        // work with the document...
>> }
>> On Jun 13, 2013, at 1:56 PM, Sriram Sankar <sankar@gmail.com> wrote:
>> 
>>> Can someone point me to the code that traverses the posting lists?  I
>>> trying to understand how it works.
>>> 
>>> Thanks,
>>> 
>>> Sriram
>> 
>> ---
>> Denis Bazhenov <dotsid@gmail.com>
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 

---
Denis Bazhenov <dotsid@gmail.com>






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message