Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 44033 invoked from network); 21 Dec 2010 14:27:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Dec 2010 14:27:21 -0000 Received: (qmail 79424 invoked by uid 500); 21 Dec 2010 14:27:18 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 79052 invoked by uid 500); 21 Dec 2010 14:27:18 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 79044 invoked by uid 99); 21 Dec 2010 14:27:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Dec 2010 14:27:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jjl@panix.com designates 166.84.1.72 as permitted sender) Received: from [166.84.1.72] (HELO mail1.panix.com) (166.84.1.72) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Dec 2010 14:27:09 +0000 Received: from mailbackend.panix.com (mailbackend.panix.com [166.84.1.89]) by mail1.panix.com (Postfix) with ESMTP id 7EF501F087 for ; Tue, 21 Dec 2010 09:26:45 -0500 (EST) Received: from [10.0.1.2] (chello080109097013.14.15.vie.surfer.at [80.109.97.13]) by mailbackend.panix.com (Postfix) with ESMTP id 6E35C32510 for ; Tue, 21 Dec 2010 09:26:44 -0500 (EST) Mime-Version: 1.0 Message-Id: In-Reply-To: References: Date: Tue, 21 Dec 2010 15:26:33 +0100 To: solr-user@lucene.apache.org From: "J.J. Larrea" Subject: Re: Consequences for using multivalued on all fields Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Someone please correct me if I am wrong, but as far as I am aware index format is identical in either case. One benefit of allowing one to specify a field as single-valued is similar to specifying that a field is required: Providing a safeguard that index data conforms to requirements. So making all fields multivalued forgoes that integrity check for fields which by definition should be singular. Also depending on the response writer and for the XMLResponseWriter the requested response version (see http://wiki.apache.org/solr/XMLResponseFormat) the multi-valued setting can determine whether the document values returned from a query will be scalars (eg. 2010) or arrays of scalars (2010), regardless of how many values are actually stored. But the most significant gotcha of not specifying the actual arity (1 or N) arises if any of those fields is used for field-faceting: By default the field-faceting logic chooses a different algorithm depending on whether the field is multi-valued, and the default choice for multi-valued is only appropriate for a small set of enumerated values since it creates a filter query for each value in the set. And this can have a profound effect on Solr memory utilization. So if you are not relying on the field arity setting to select the algorithm, you or your users might need to specify it explicitly with the f..facet.method argument; see http://wiki.apache.org/solr/SolrFacetingOverview for more info. So while all-multivalued isn't a showstopper, if it were up to me I'd want to give users the option to specify arity and whether the field is required. - J.J. At 2:13 PM +0100 12/21/10, Tim Terleg�rd wrote: >In our application we use dynamic fields and there can be about 50 of >them and there can be up to 100 million documents. > >Are there any disadvantages having multivalued=true on all fields in >the schema? An admin of the application can specify dynamic fields and >if they should be indexed or stored. Question is if we gain anything >by letting them to choose multivalued as well or if it just adds >complexity to the user interface? > >Thanks, >Tim