lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mike anderson <saidthero...@gmail.com>
Subject Re: how well does multicore scale?
Date Wed, 27 Oct 2010 12:20:21 GMT
Tagging every document with a few hundred thousand 6 character user-ids
would  increase the document size by two orders of magnitude. I can't
imagine why this wouldn't mean the index would increase by just as much
(though I really don't know much about that file structure). By my simple
math, this would mean that if we want each shard's index to be able to fit
in memory, then (even with some beefy servers) each query would have to go
out to a few thousand shards (as opposed to 21 if we used the MultiCore
approach). This means the typical response time would be much slower.


-mike

On Tue, Oct 26, 2010 at 10:15 AM, Jonathan Rochkind <rochkind@jhu.edu>wrote:

> mike anderson wrote:
>
>> I'm really curious if there is a clever solution to the obvious problem
>> with: "So your better off using a single index and with a user id and use
>> a query filter with the user id when fetching data.", i.e.. when you have
>> hundreds of thousands of user IDs tagged on each article. That just
>> doesn't
>> sound like it scales very well..
>>
>>
> Actually, I think that design would scale pretty fine, I don't think
> there's an 'obvious' problem. You store your userIDs in a multi-valued field
> (or as multiple terms in a single value, ends up being similar). You fq on
> there with the current userID.   There's one way to find out of course, but
> that doesn't seem a patently ridiculous scenario or anything, that's the
> kind of thing Solr is generally good at, it's what it's built for.   The
> problem might actually be in the time it takes to add such a document to the
> index; but not in query time.
>
> Doesn't mean it's the best solution for your problem though, I can't say.
>
> My impression is that Solr in general isn't really designed to support the
> kind of multi-tenancy use case people are talking about lately.  So trying
> to make it work anyway... if multi-cores work for you, then great, but be
> aware they weren't really designed for that (having thousands of cores) and
> may not. If a single index can work for you instead, great, but as you've
> discovered it's not neccesarily obvious how to set up the schema to do what
> you need -- really this applies to Solr in general, unlike an rdbms where
> you just third-form-normalize everything and figure it'll work for almost
> any use case that comes up,  in Solr you generally need to custom fit the
> schema for your particular use cases, sometimes being kind of clever to
> figure out the optimal way to do that.
>
> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
> index takes more intellectual work than setting up an rdbms. The trade off
> is you get speed, and flexible ways to set up relevancy (that still perform
> well). Took a couple decades for rdbms to get as brainless to use as they
> are, maybe in a couple more we'll have figured out ways to make indexing
> engines like solr equally brainless, but not yet -- but it's still pretty
> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message