Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 27201 invoked from network); 20 Oct 2006 01:24:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 20 Oct 2006 01:24:21 -0000 Received: (qmail 76454 invoked by uid 500); 20 Oct 2006 01:24:21 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 76436 invoked by uid 500); 20 Oct 2006 01:24:20 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 76425 invoked by uid 99); 20 Oct 2006 01:24:20 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Oct 2006 18:24:20 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of yseeley@gmail.com designates 66.249.92.171 as permitted sender) Received: from [66.249.92.171] (HELO ug-out-1314.google.com) (66.249.92.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Oct 2006 18:24:20 -0700 Received: by ug-out-1314.google.com with SMTP id k40so620510ugc for ; Thu, 19 Oct 2006 18:23:59 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=LbvAMZCJRdT7n3GDhzBQn5VnN2R95Vl8Rh0UkoEKjNYWsPY9YnRys2vMlzIJS1hU3SyV8oFbY0NPLi/+3P1UPpNUk/6gNes1/d6BZPympS40frA7fGnIZ754zV8EblgpwKh+y38kw236CtafP1/HHX1hR0OA42rnJ1Brv+OsDXo= Received: by 10.82.142.9 with SMTP id p9mr370521bud; Thu, 19 Oct 2006 18:23:58 -0700 (PDT) Received: by 10.82.149.12 with HTTP; Thu, 19 Oct 2006 18:23:58 -0700 (PDT) Message-ID: Date: Thu, 19 Oct 2006 21:23:58 -0400 From: "Yonik Seeley" Sender: yseeley@gmail.com To: general@lucene.apache.org Subject: Re: [PROPOSAL] index server project In-Reply-To: <392521EA2692A2418DF48C331E61E32522D3@professorville.windows.esseff.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <392521EA2692A2418DF48C331E61E32522D3@professorville.windows.esseff.org> X-Google-Sender-Auth: 89eca3e645d718c2 X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On 10/19/06, Steven Parkes wrote: > You mention partitioning of indexes, though mostly around delete. What > about scalability of corpus size? Definitely in scope. Solr already has scalability of search volume via searchers behind of a load balancer all getting their index from a master. The problem comes when an index is too big to get decent latency for a single query, and that's when you need to partiton the index into "shards" to use google terminology. > Would partitioning be effective for > that, too? Yes, to a certain extent. At some point you run into network bandwidth issues if you go deep into rankings. > What about scalability of ingest rate? As it relates to indexing, I think nutch already has that base covered. > What are you thinking, in terms of size? Is this a 10 node thing? I'm personally interested in perhaps 10 to 20 index shards, with multiple replicas of each shard for HA and query load scalability. > A 1000 > node thing? More? Bigger is cool, but raises a lot of issues. Should be possible, but I won't personally be looking for that. I think scaling effectively will be partially in the hands of the client and how it chooses to merge results from shards. > How > dynamic? > Can nodes come and go? Unplanned: yes. HA is personally key for me. Planned (adding capacity gracefully): it would be nice. I actually hadn't planned it for Solr. > Are you going to assume homogeneity of > nodes? Hardware homogeneity? That might be out of scope... I'd start off without worrying about it in any case. > What about add/modify/delete to search visibility latency? Close to > batch/once-a-day or real-time? Anywhere in between I'd think. "Realtime" latencies of minutes or longer are normally fine. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server