lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Solr limitations
Date Wed, 10 Jul 2013 19:38:50 GMT
Also, total index file size. At 200-300gb managing an index becomes a pain.

Lance

On 07/08/2013 07:28 AM, Jack Krupansky wrote:
> Other that the per-node/per-collection limit of 2 billion documents 
> per Lucene index, most of the limits of Solr are performance-based 
> limits - Solr can handle it, but the performance may not be 
> acceptable. Dynamic fields are a great example. Nothing prevents you 
> from creating a document with, say, 50,000 dynamic fields, but you are 
> likely to find the performance less than acceptable. Or facets. Sure, 
> Solr will let you have 5,000 faceted fields, but the performance is 
> likely to be... you get the picture.
>
> What is acceptable performance? That's for you to decide.
>
> What will the performance of 5,000 dynamic fields or 500 faceted 
> fields or 500 million documents on a node be? It all depends on your 
> data, especially the cardinality (unique values) of each individual 
> field.
>
> How can you determine the performance? Only one way: Proof of concept. 
> You need to do your own proof of concept implementation, with your own 
> representative data, with your own representative data model, with 
> your own representative hardware, with your own representative client 
> software, with your own representative user query load. That testing 
> will give you all the answers you need.
>
> There are are no magic answers. Don't believe any magic spreadsheet or 
> magic wizard. Flip a coin whether they will work for your situation.
>
> Some simple, common sense limits:
>
> 1. No more than 50 to 100 million documents per node.
> 2. No more than 250 fields per document.
> 3. No more than 250K characters per document.
> 4. No more than 25 faceted fields.
> 5. No more than 32 nodes in your SolrCloud cluster.
> 6. Don't return more than 250 results on a query.
>
> None of those is a hard limit, but don't go beyond them unless your 
> Proof of Concept testing proves that performance is acceptable for 
> your situation.
>
> Start with a simple 4-node, 2-shard, 2-replica cluster for preliminary 
> tests and then scale as needed.
>
> Dynamic and multivalued fields? Try to stay away from them - excepts 
> for the simplest cases, they are usually an indicator of a weak data 
> model. Sure, it's fine to store a relatively small number of values in 
> a multivalued field (say, dozens of values), but be aware that you 
> can't directly access individual values, you can't tell which was 
> matched on a query, and you can't coordinate values between multiple 
> multivalued fields. Except for very simple cases, multivalued fields 
> should be flattened into multiple documents with a parent ID.
>
> Since you brought up the topic of dynamic fields, I am curious how you 
> got the impression that they were a good technique to use as a 
> starting point. They're fine for prototyping and hacking, and fine 
> when used in moderation, but not when used to excess. The whole point 
> of Solr is searching and searching is optimized within fields, not 
> across fields, so having lots of dynamic fields is counter to the 
> primary strengths of Lucene and Solr. And... schemas with lots  of 
> dynamic fields tend to be difficult to maintain. For example, if you 
> wanted to ask a support question here, one of the first things we want 
> to know is what your schema looks like, but with lots of dynamic 
> fields it is not possible to have a simple discussion of what your 
> schema looks like.
>
> Sure, there is something called "schemaless design" (and Solr supports 
> that in 4.4), but that's very different from heavy reliance on dynamic 
> fields in the traditional sense. Schemaless design is A-OK, but using 
> dynamic fields for "arrays" of data in a single document is a poor 
> match for the search features of Solr (e.g., Edismax searching across 
> multiple fields.)
>
> One other tidbit: Although Solr does not enforce naming conventions 
> for field names, and you can put special characters in them, there are 
> plenty of features in Solr, such as the common "fl" parameter, where 
> field names are expected to adhere to Java naming rules. When people 
> start "going wild" with dynamic fields, it is common that they start 
> "going wild" with their names as well, using spaces, colons, slashes, 
> etc. that cannot be parsed in the "fl" and "qf" parameters, for 
> example. Please don't go there!
>
> In short, put up a small cluster and start doing a Proof of Concept 
> cluster. Stay within my suggested guidelines and you should do okay.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Marcelo Elias Del Valle
> Sent: Monday, July 08, 2013 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Solr limitations
>
> Hello everyone,
>
>    I am trying to search information about possible solr limitations I
> should consider in my architecture. Things like max number of dynamic
> fields, max number o documents in SolrCloud, etc.
>    Does anyone know where I can find this info?
>
> Best regards,


Mime
View raw message