lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: Solr content limits?
Date Wed, 27 Aug 2014 13:46:29 GMT
On 8/26/2014 9:36 PM, lalitjangra wrote:
> I am using SOlr 4.6.0 with single collection/core and want to know details
> about following.
> 
> 1. What is the maximum number of documents which can be uploaded in a single
> collection/core?
> 2. What is the maximum size of  a document i can upload in solr without
> failing?
> 3. Is there any way to update these limits, if possible?

There is exactly one hard limit in Solr that cannot be changed with
configuration.  That limit comes about because the Lucene on-disk index
format uses a 32-bit signed integer value (the "int" or "Integer" data
type in Java) for internal document identifiers.  The biggest number
that format can contain is 2147483647 -- a little more than two billion.
 A Lucene index cannot contain more than 2147483647 documents.  Because
deleted documents are counted along with the rest, it is advisable to
not exceed one billion live documents per Solr core -- each core
maintains one Lucene index.  Limiting yourself to one billion documents
will make it possible for the index to contain one billion live
documents as well as one billion deleted documents.

You can create Solr indexes well beyond the Lucene limit by going
distributed.  One way to easily do this is with SolrCloud - create a
collection with multiple shards.  Each shard will be limited to
2147483647 documents, but the whole index can be as many shards as you
require.  It is definitely recommended that you load such an index onto
many servers.

There are no size limits at all, although you should be aware that most
tokenizers and token filters do have a hard-coded maximum token size
that is typically between 256 and 4096 characters.  A character may be
more than one byte.  For example, the wide space common in oriental
languages takes up three bytes in UTF-8 encoding:

http://www.fileformat.info/info/unicode/char/3000/index.htm

One final note: You'll almost always encounter resource limits -- RAM,
IOPS, or CPU -- before you actually run into Lucene's one undefeatable
limitation.

Thanks,
Shawn


Mime
View raw message