lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Multiple documents with different fields in the same collection
Date Thu, 15 Aug 2019 13:13:48 GMT
Vlad:

“different schemas in one schema” is really just multiple <field> and <fieldType>
definitions. There’s no requirement at all that two Solr documents share any fields (well,
except if you have ‘required=“true”’ set for a field, for instance the “id” field).

I’d also include a “doc_type” field set to A or B to allow you to restrict searches
to a single type if that’s desirable by adding “fq=field_type:A” for instance.

Then it’s just a matter of convention. For searching docs of type A, you search in fieldA1,
fieldA2 etc. For type B fieldB1, fieldB2….. You can set up different request handlers in
solrconfig.xml (e.g. like the “select” or “query” handlers) if you’d like different
defaults for the two.

Unused fields add very, very little to search times due to how the inverted index is structured.
When you get to 100s of fields, it might be noticeable with careful measurements. I know of
systems with over 1,000 fields that perform acceptably though.

You have to define what “…these two cores should be combined to produce….” means though.
Unless the documents share some fields, you’d get disjoint sets of documents back. "q=fieldA1:val1
AND fieldB1:val2” would produce no documents at all if no documents had both fields….

This sounds like stand-alone Solr if you’re talking about “cores”. Is that a conscious
choice? SolrCloud gives you HA/DR even with single-shard collections….

Finally, you say one type of doc is relatively static and one more dynamic. How dynamic? The
one drawback in the above is that if your commit rate is quite high, the static portions of
your index won’t be cached as usefully as they would if they were, indeed, in separate cores
(collections in SolrCloud).

Best,
Erick



> On Aug 14, 2019, at 5:43 PM, Vlad Beznosov <vbeznosov@ritchiebros.com.INVALID>
wrote:
> 
> Hello SOLR Users.
> 
> I am new to SOLR, so please forgive me if something in this email will not make sense
to some of you.
> 
> Here is the problem I am trying to solve:
> 
> We have a collection of documents A that has corresponding configuration set with schema.xml
file in it.
> We need to add another core to that collection, which will contain a document of type
B with fields mostly different from document of type A except for the field "key", which is
also present in document of type A.
> Indexing of these two cores should be done independently: core A stores dynamic data
while data in core B is largely static.
> But for search purposes these two cores should be combined to produce result based on
criteria built from the fields from both cores.
> 
> I have found a post that suggests creating a separate schema that will unite the two
documents: https://stackoverflow.com/questions/19313910/query-multiple-collections-with-different-fields-in-solr
> 
> So far so good, but now I am trying to figure out how to put it all together: Can I define
three different document schemas in the same schema.xml (and if yes, how that can be done),
or should I create separate schema.xml files for each document (and if yes, where should they
be placed).
> 
> Ideally it would be nice to have this configured within the same collection to make this
transparent for the search.
> 
> Any help would be greatly appreciated.
> 
> Thank you,
> Vlad.


Mime
View raw message