lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum Gupta <ans...@anshumgupta.net>
Subject Re: clarification regarding shard splitting and composite IDs
Date Thu, 05 Feb 2015 05:41:23 GMT
Solr 5.0 has support for distributed IDF. Also, users having the same IDF
is orthogonal to the original question.

In general, the Doc Freq. is only per-shard. If for some reason, a single
user has documents split across shards, the IDF used would be different for
docs on different shards.

On Wed, Feb 4, 2015 at 9:06 PM, Dan Davis <dansmood@gmail.com> wrote:

> Doesn't relevancy for that assume that the IDF and TF for user1 and user2
> are not too different?    SolrCloud still doesn't use a distributed IDF,
> correct?
>
> On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum <gilinachum@gmail.com> wrote:
>
> > Alright. So shard splitting and composite routing plays nicely together.
> > Thank you Anshum.
> >
> > On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta <anshum@anshumgupta.net>
> > wrote:
> >
> > > In one line, shard splitting doesn't cater to depend on the routing
> > > mechanism but just the hash range so you could have documents for the
> > same
> > > prefix split up.
> > >
> > > Here's an overview of routing in SolrCloud:
> > > * Happens based on a hash value
> > > * The hash is calculated using the multiple parts of the routing key.
> In
> > > case of A!B, 16 bits are obtained from murmurhash(A) and the LSB 16
> bits
> > of
> > > the routing key are obtained from murmurhash(B). This sends the docs to
> > the
> > > right shard.
> > > * When querying using A!, all shards that contain hashes from the range
> > 16
> > > bits from murmurhash(A)-0000 to murmurhash(A)-ffff are used.
> > >
> > > When you split a shard, for say range 00000000 - ffffffff , it is split
> > > from the middle (by default) and over multiple split, docs for the same
> > A!
> > > prefix might end up on different shards, but the request routing should
> > > take care of that.
> > >
> > > You can read more about routing here:
> > > https://lucidworks.com/blog/solr-cloud-document-routing/
> > > http://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/
> > >
> > > and shard splitting here:
> > > http://lucidworks.com/blog/shard-splitting-in-solrcloud/
> > >
> > >
> > > On Wed, Feb 4, 2015 at 12:59 AM, Gili Nachum <gilinachum@gmail.com>
> > wrote:
> > >
> > > > Hi, I'm also interested. When using composite the ID, the _route_
> > > > information is not kept on the document itself, so to me it looks
> like
> > > it's
> > > > not possible as the split API
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> > > > >
> > > > doesn't have a relevant parameter to split correctly.
> > > > Could report back once I try it in practice.
> > > >
> > > > On Mon, Nov 10, 2014 at 7:27 PM, Ian Rose <ianrose@fullstory.com>
> > wrote:
> > > >
> > > > > Howdy -
> > > > >
> > > > > We are using composite IDs of the form <user>!<event>.
 This
> ensures
> > > that
> > > > > all events for a user are stored in the same shard.
> > > > >
> > > > > I'm assuming from the description of how composite ID routing
> works,
> > > that
> > > > > if you split a shard the "split point" of the hash range for that
> > shard
> > > > is
> > > > > chosen to maintain the invariant that all documents that share a
> > > routing
> > > > > prefix (before the "!") will still map to the same (new) shard. 
Is
> > > that
> > > > > accurate?
> > > > >
> > > > > A naive shard-split implementation (e.g. that chose the hash range
> > > split
> > > > > point arbitrarily) could end up with "child" shards that split a
> > > routing
> > > > > prefix.
> > > > >
> > > > > Thanks,
> > > > > Ian
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Anshum Gupta
> > > http://about.me/anshumgupta
> > >
> >
>



-- 
Anshum Gupta
http://about.me/anshumgupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message