lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: [PROPOSAL] index server project
Date Fri, 20 Oct 2006 01:23:58 GMT
On 10/19/06, Steven Parkes <steven_parkes@esseff.org> wrote:
> You mention partitioning of indexes, though mostly around delete. What
> about scalability of corpus size?

Definitely in scope.  Solr already has scalability of search volume
via searchers behind of a load balancer all getting their index from a
master.  The problem comes when an index is too big to get decent
latency for a single query, and that's when you need to partiton the
index into "shards" to use google terminology.

> Would partitioning be effective for
> that, too?

Yes, to a certain extent.  At some point you run into network
bandwidth issues if you go deep into rankings.

> What about scalability of ingest rate?

As it relates to indexing, I think nutch already has that base covered.

> What are you thinking, in terms of size? Is this a 10 node thing?

I'm personally interested in perhaps 10 to 20 index shards, with
multiple replicas of each shard for HA and query load scalability.

> A 1000
> node thing? More? Bigger is cool, but raises a lot of issues.

Should be possible, but I won't personally be looking for that.  I
think scaling effectively will be partially in the hands of the client
and how it chooses to merge results from shards.

> How
> dynamic?

> Can nodes come and go?

Unplanned: yes.  HA is personally key for me.
Planned (adding capacity gracefully): it would be nice.  I actually
hadn't planned it for Solr.

> Are you going to assume homogeneity of
> nodes?

Hardware homogeneity?  That might be out of scope... I'd start off
without worrying about it in any case.

> What about add/modify/delete to search visibility latency? Close to
> batch/once-a-day or real-time?

Anywhere in between I'd think.  "Realtime" latencies of minutes or
longer are normally fine.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

Mime
View raw message