lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Documenting document limits for Lucene and Solr
Date Wed, 30 May 2012 15:52:14 GMT
AFAICT, there is no clear documentation of the maximum number of documents that can be stored
in a Lucene or Solr Index (single core/shard). It appears to be 2^31 since a Lucene document
number and the value returned from IW.maxDoc is a Java “int”. Lucene users have that “hint”
to guide them, but that hint is never surfaced for Solr users, AFAICT. A few years ago nobody
in their right mind would imagine indexing 2 billion documents in a single machine/core, but
now people are at least tempted to try. So, it is now more important for people to know about
it, up front, not hidden down in the fine print of Lucene file formats.

I wanted to file a Jira on this, but I wanted to check first if anybody knows of an existing
Jira for it that maybe was worded in a way that it escaped my semi-diligent searches.

I was also thinking of filing it as two Jiras, one for Lucene and one for Solr since the doc
would be in different places. Or, should there be one combined “Lucene/Solr Capacity Limits/Planning”
wiki? Unless somebody objects, I’ll file as two separate (but linked) issues.

And, I was also thinking of filing two Jiras for Lucene and Solr to each have a robust check
for exceeding the underlying Lucene limit and reporting this exception in a well-defined manner
rather than “numFound” or “maxDoc” going negative. But this is separate from the documentation
issue, I think. Unless somebody objects, I’ll file these as two separate issues.

Any objection to me filing these four issues?

-- Jack Krupansky
Mime
View raw message