lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: 1 main collection or multiple smaller collections?
Date Thu, 27 Apr 2017 17:08:47 GMT
Design backwards from the search result pages (SRP). Make flat schema(s) with the fields you
will search and display.

One example is the schema I used at Netflix. I used one collection to hold movies, people
(actors), and genres. There were collisions between the integer IDs, movies IDs were prefixed
with “m”, people with “p”, and genres with “g”. The searched fields were “title”
and “description”. There was also a “type” field which was “movie”, “person”,
or “genre”. There was a also a field for the database ID (without the prefix).

A movie SRP used an “fq” filter of “type:movie”, and so on for other SRPs. There were
a few other filters, like G-rated movies or streaming, DVD, HD DVD, or Bluray.

The full index was under 350K documents.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Apr 27, 2017, at 10:01 AM, Rick Leir <rleir@leirtech.com> wrote:
> 
> Does it make sense to use nested documents here? Products could be nested in a supplier
document perhaps.
> 
> Alternately, consider de-normalizing "til it hurts". A product doc might be able to contain
supplier info.
> 
> On April 27, 2017 8:50:59 AM EDT, Shawn Heisey <apache@elyograg.org> wrote:
>> On 4/26/2017 11:57 PM, Derek Poh wrote:
>>> There are some common fields between them.
>>> At the source data end (database), the supplier info and product info
>>> are updated separately. In this regard, I should separate them?
>>> If it's In 1 single collection, when there are updatesto only the
>>> supplier info,the product info will be index again even though there
>>> is noupdates to them, Is my reasoning valid?
>>> 
>>> 
>>> On 4/27/2017 1:33 PM, Walter Underwood wrote:
>>>> Do they have the same fields or different fields? Are they updated
>>>> separately or together?
>>>> 
>>>> If they have the same fields and are updated together, I’d put them
>>>> in the same collection. Otherwise, probably separate. 
>> 
>> Walter's statements are right on the money, you just might need a
>> little
>> more detail.
>> 
>> There are are two critical details that decide whether you even CAN
>> combine different data in a single index: One is that all types of
>> records must use the same field (the uniqueKey field) to determine
>> uniqueness, and the value of this field must be unique across the
>> entire
>> dataset.  The other is that there SHOULD be a field with a name like
>> "type" that your search client can use to differentiate the different
>> kinds of documents.  This type field is not necessary, but it does make
>> things easier.
>> 
>> Assuming you CAN combine documents, there is still the question of
>> whether you SHOULD.  If the fields that you will commonly search are
>> the
>> same between the different kinds of documents, and if people want to be
>> able to do one search and get more than one of the document types you
>> are indexing, then it is something you should consider.  If people will
>> only ever search one type of document, you should probably keep them in
>> separate indexes to keep things cleaner.
>> 
>> Thanks,
>> Shawn
> 
> -- 
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message