lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.J. Larrea" <...@panix.com>
Subject Re: Consequences for using multivalued on all fields
Date Tue, 21 Dec 2010 14:26:33 GMT
Someone please correct me if I am wrong, but as far as I am aware index format is identical
in either case.

One benefit of allowing one to specify a field as single-valued is similar to specifying that
a field is required: Providing a safeguard that index data conforms to requirements.  So making
all fields multivalued forgoes that integrity check for fields which by definition should
be singular.

Also depending on the response writer and for the XMLResponseWriter the requested response
version (see http://wiki.apache.org/solr/XMLResponseFormat) the multi-valued setting can determine
whether the document values returned from a query will be scalars (eg. <str name="year">2010</str>)
or arrays of scalars (<arr name="year"><str>2010</str></arr>), regardless
of how many values are actually stored.

But the most significant gotcha of not specifying the actual arity (1 or N) arises if any
of those fields is used for field-faceting: By default the field-faceting logic chooses a
different algorithm depending on whether the field is multi-valued, and the default choice
for multi-valued is only appropriate for a small set of enumerated values since it creates
a filter query for each value in the set. And this can have a profound effect on Solr memory
utilization. So if you are not relying on the field arity setting to select the algorithm,
you or your users might need to specify it explicitly with the f.<field>.facet.method
argument; see http://wiki.apache.org/solr/SolrFacetingOverview for more info.

So while all-multivalued isn't a showstopper, if it were up to me I'd want to give users the
option to specify arity and whether the field is required.

- J.J.

At 2:13 PM +0100 12/21/10, Tim TerlegÄrd wrote:
>In our application we use dynamic fields and there can be about 50 of
>them and there can be up to 100 million documents.
>
>Are there any disadvantages having multivalued=true on all fields in
>the schema? An admin of the application can specify dynamic fields and
>if they should be indexed or stored. Question is if we gain anything
>by letting them to choose multivalued as well or if it just adds
>complexity to the user interface?
>
>Thanks,
>Tim


Mime
View raw message