lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: Re: What is the fastest way to loop over all documents in an index?
Date Tue, 05 Sep 2017 08:34:30 GMT

Hi Ishan,

I saw following loop is suggested for this task in the stack overflow.

for (int i=0; i<reader.maxDoc(); i++)

How can we confirm that internal Lucene IDs are subsequent numbers from 0 to maxDoc()-1?

I thought that they are arbitrary integers.

Ahmet



On Tuesday, September 5, 2017, 7:54:31 AM GMT+3, Ishan Chattopadhyaya <ichattopadhyaya@gmail.com>
wrote: 





Maybe IndexReader#document(), looping over docids is the best here?
http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/index/IndexReader.html#document-int-

On Tue, Sep 5, 2017 at 7:57 AM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
wrote:

> Hi Jean,
>
> I am also interested answers to this question. I need this feature too.
> Currently I am using a hack.
> I create an artificial field (with an artificial token) attached to every
> document.
>
> I traverse all documents using the code snippet given in my previous
> related question. (no one answered to it)
>
> http://lucene.472066.n3.nabble.com/PostingsEnum-for-
> documents-that-does-not-contain-a-term-td4349482.html
> I found EverythingEnum class in the Lucene50PostingsReader.java, but I
> couldn't figure out how to use it.
> So, I do not know if this class is for the task, but its name looks
> promising.
> Thanks,Ahmet
>
>
>
> On Tuesday, September 5, 2017, 3:09:37 AM GMT+3, Jean Claude van Johnson <
> vanjohnsonjeanclaude@gmail.com> wrote:
>
>
>
>
>
> Hi there,
>
> I have an use case, were I need to iterate over all documents in an index
> from time to time.
> It seems that the MatchAllDocsQuery is what I should use for this, however
> it creates a bunch of Objects (Score etc) that I don’t really need.
>
> My question to you is:
>
> What is the fastest way to loop over all documents in an index?
> Is it looping over all possible doc id’s (+filtering out deleted
> documents)?
>
> Thank you very much.
>
> Best regards
> Claude
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message