lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Lorenzetto <cristian.lorenze...@gmail.com>
Subject Re: docid is just a signed int32
Date Thu, 18 Aug 2016 14:35:16 GMT
Maybe lucene has maxsize 2^31 because result set are java array where
length is a int type.
A suggestion for possible changes in future is to not use java array but
Iterator. Iterator is a ADT more scalable , not sucking memory for
returning documents.


2016-08-18 16:03 GMT+02:00 Glen Newton <glen.newton@gmail.com>:

> Or maybe it is time Lucene re-examined this limit.
>
> There are use cases out there where >2^31 does make sense in a single index
> (huge number of tiny docs).
>
> Also, I think the underlying hardware and the JDK have advanced to make
> this more defendable.
>
> Constructively,
> Glen
>
>
> On Thu, Aug 18, 2016 at 9:55 AM, Adrien Grand <jpountz@gmail.com> wrote:
>
> > No, IndexWriter enforces that the number of documents cannot go over
> > IndexWriter.MAX_DOCS (which is a bit less than 2^31) and
> > BaseCompositeReader computes the number of documents in a long variable
> and
> > ensures it is less than 2^31, so you cannot have indexes that contain
> more
> > than 2^31 documents.
> >
> > Larger collections should be written to multiple shards and use
> > TopDocs.merge to merge results.
> >
> > Le jeu. 18 août 2016 à 15:38, Cristian Lorenzetto <
> > cristian.lorenzetto@gmail.com> a écrit :
> >
> > > docid is a signed int32 so it is not so big, but really docid seams
> not a
> > > primary key unmodifiable but a temporary id for the view related to a
> > > specific search.
> > >
> > > So repository can contains more than 2^31 documents.
> > >
> > > My deduction is correct ? is there a maximum size for lucene index?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message