lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Davis <dansm...@gmail.com>
Subject Re: clarification regarding shard splitting and composite IDs
Date Thu, 05 Feb 2015 05:06:17 GMT
Doesn't relevancy for that assume that the IDF and TF for user1 and user2
are not too different?    SolrCloud still doesn't use a distributed IDF,
correct?

On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum <gilinachum@gmail.com> wrote:

> Alright. So shard splitting and composite routing plays nicely together.
> Thank you Anshum.
>
> On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta <anshum@anshumgupta.net>
> wrote:
>
> > In one line, shard splitting doesn't cater to depend on the routing
> > mechanism but just the hash range so you could have documents for the
> same
> > prefix split up.
> >
> > Here's an overview of routing in SolrCloud:
> > * Happens based on a hash value
> > * The hash is calculated using the multiple parts of the routing key. In
> > case of A!B, 16 bits are obtained from murmurhash(A) and the LSB 16 bits
> of
> > the routing key are obtained from murmurhash(B). This sends the docs to
> the
> > right shard.
> > * When querying using A!, all shards that contain hashes from the range
> 16
> > bits from murmurhash(A)-0000 to murmurhash(A)-ffff are used.
> >
> > When you split a shard, for say range 00000000 - ffffffff , it is split
> > from the middle (by default) and over multiple split, docs for the same
> A!
> > prefix might end up on different shards, but the request routing should
> > take care of that.
> >
> > You can read more about routing here:
> > https://lucidworks.com/blog/solr-cloud-document-routing/
> > http://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/
> >
> > and shard splitting here:
> > http://lucidworks.com/blog/shard-splitting-in-solrcloud/
> >
> >
> > On Wed, Feb 4, 2015 at 12:59 AM, Gili Nachum <gilinachum@gmail.com>
> wrote:
> >
> > > Hi, I'm also interested. When using composite the ID, the _route_
> > > information is not kept on the document itself, so to me it looks like
> > it's
> > > not possible as the split API
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> > > >
> > > doesn't have a relevant parameter to split correctly.
> > > Could report back once I try it in practice.
> > >
> > > On Mon, Nov 10, 2014 at 7:27 PM, Ian Rose <ianrose@fullstory.com>
> wrote:
> > >
> > > > Howdy -
> > > >
> > > > We are using composite IDs of the form <user>!<event>.  This
ensures
> > that
> > > > all events for a user are stored in the same shard.
> > > >
> > > > I'm assuming from the description of how composite ID routing works,
> > that
> > > > if you split a shard the "split point" of the hash range for that
> shard
> > > is
> > > > chosen to maintain the invariant that all documents that share a
> > routing
> > > > prefix (before the "!") will still map to the same (new) shard.  Is
> > that
> > > > accurate?
> > > >
> > > > A naive shard-split implementation (e.g. that chose the hash range
> > split
> > > > point arbitrarily) could end up with "child" shards that split a
> > routing
> > > > prefix.
> > > >
> > > > Thanks,
> > > > Ian
> > > >
> > >
> >
> >
> >
> > --
> > Anshum Gupta
> > http://about.me/anshumgupta
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message