lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ishan Chattopadhyaya <ichattopadhy...@gmail.com>
Subject Re: Re: What is the fastest way to loop over all documents in an index?
Date Tue, 05 Sep 2017 18:51:37 GMT
I believe that's the case. Leave the deleted docs out, though (which can be
computed by intersecting with some other bitset.).

On Tue, Sep 5, 2017 at 2:04 PM, Ahmet Arslan <iorixxx@yahoo.com> wrote:

>
> Hi Ishan,
>
> I saw following loop is suggested for this task in the stack overflow.
>
> for (int i=0; i<reader.maxDoc(); i++)
>
> How can we confirm that internal Lucene IDs are subsequent numbers from 0
> to maxDoc()-1?
>
> I thought that they are arbitrary integers.
>
> Ahmet
>
>
>
>
> On Tuesday, September 5, 2017, 7:54:31 AM GMT+3, Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
>
>
>
>
>
> Maybe IndexReader#document(), looping over docids is the best here?
> http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/
> index/IndexReader.html#document-int-
>
> On Tue, Sep 5, 2017 at 7:57 AM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
> wrote:
>
> > Hi Jean,
> >
> > I am also interested answers to this question. I need this feature too.
> > Currently I am using a hack.
> > I create an artificial field (with an artificial token) attached to every
> > document.
> >
> > I traverse all documents using the code snippet given in my previous
> > related question. (no one answered to it)
> >
> > http://lucene.472066.n3.nabble.com/PostingsEnum-for-
> > documents-that-does-not-contain-a-term-td4349482.html
> > I found EverythingEnum class in the Lucene50PostingsReader.java, but I
> > couldn't figure out how to use it.
> > So, I do not know if this class is for the task, but its name looks
> > promising.
> > Thanks,Ahmet
> >
> >
> >
> > On Tuesday, September 5, 2017, 3:09:37 AM GMT+3, Jean Claude van Johnson
> <
> > vanjohnsonjeanclaude@gmail.com> wrote:
> >
> >
> >
> >
> >
> > Hi there,
> >
> > I have an use case, were I need to iterate over all documents in an index
> > from time to time.
> > It seems that the MatchAllDocsQuery is what I should use for this,
> however
> > it creates a bunch of Objects (Score etc) that I don’t really need.
> >
> > My question to you is:
> >
> > What is the fastest way to loop over all documents in an index?
> > Is it looping over all possible doc id’s (+filtering out deleted
> > documents)?
> >
> > Thank you very much.
> >
> > Best regards
> > Claude
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message