lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: docid is just a signed int32
Date Fri, 19 Aug 2016 01:31:59 GMT
On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand <jpountz@gmail.com> wrote:
> No, IndexWriter enforces that the number of documents cannot go over
> IndexWriter.MAX_DOCS (which is a bit less than 2^31) and
> BaseCompositeReader computes the number of documents in a long variable and
> ensures it is less than 2^31, so you cannot have indexes that contain more
> than 2^31 documents.
>
> Larger collections should be written to multiple shards and use
> TopDocs.merge to merge results.

But hang on:
* TopDocs#merge still returns a TopDocs.
* TopDocs still uses an array of ScoreDoc.
* ScoreDoc still uses an int doc ID.

Looks like you're still screwed.

I wish IndexReader would use long IDs too, because one IndexReader can
be across multiple shards too - it doesn't make much sense to me that
this is restricted, although "it's hard to fix in a
backwards-compatible way" is certainly a good reason. :D

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message