Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D08425845 for ; Tue, 10 May 2011 13:03:13 +0000 (UTC) Received: (qmail 98777 invoked by uid 500); 10 May 2011 13:03:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 98709 invoked by uid 500); 10 May 2011 13:03:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 98701 invoked by uid 99); 10 May 2011 13:03:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 May 2011 13:03:10 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sokolov@ifactory.com designates 68.236.111.2 as permitted sender) Received: from [68.236.111.2] (HELO camelot.ifactory.com) (68.236.111.2) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 May 2011 13:03:03 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by camelot.ifactory.com (Postfix) with ESMTP id 6E49D3672B59; Tue, 10 May 2011 09:02:42 -0400 (EDT) Received: from camelot.ifactory.com ([127.0.0.1]) by localhost (camelot.ifactory.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gZA1E1tln4Md; Tue, 10 May 2011 09:02:41 -0400 (EDT) Received: from aix.ifactory.com (aix.ifactory.com [192.168.10.27]) by camelot.ifactory.com (Postfix) with ESMTPA id E75323672B2F; Tue, 10 May 2011 09:02:40 -0400 (EDT) Message-ID: <4DC93770.8070905@ifactory.com> Date: Tue, 10 May 2011 09:02:40 -0400 From: Mike Sokolov User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4 MIME-Version: 1.0 To: java-user@lucene.apache.org CC: Toke Eskildsen Subject: Re: Sharding Techniques References: <1305014508.8672.59.camel@te-prime> In-Reply-To: <1305014508.8672.59.camel@te-prime> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit > Down to basics, Lucene searches work by locating terms and resolving > documents from them. For standard term queries, a term is located by a > process akin to binary search. That means that it uses log(n) seeks to > get the term. Let's say you have 10M terms in your corpus. If you stored > that in a single field in a single index with a single segment, it would > take log(10M) ~= 24 seeks to locate a term. This is of course very > simplified. > > When you have 63 indexes, log(n) works against you. Even with the > unrealistic assumption that the 10M terms are evenly distributed and > without duplicates, the number of seeks for a search that hits all parts > will still be 63 * log(10M/63) ~= 63 * 18 = 1134. And we haven't even > begun to estimate the merging part. This is true, but if the indexes are kept on 63 separate servers, those seeks will be carried out in parallel. The OP did indicate his indexes would be on different servers, I think? I still agree with your overall point - at this scale a single server is probably best. And if there are performance issues, I think the usual approach is to create multiple mirrored copies (slaves) rather than sharding. Sharding is useful for very large indexes: indexes to big to store on disk and cache in memory on one commodity box -Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org