Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 3698 invoked from network); 28 Feb 2005 07:29:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 28 Feb 2005 07:29:22 -0000 Received: (qmail 98895 invoked by uid 500); 28 Feb 2005 07:29:18 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 98861 invoked by uid 500); 28 Feb 2005 07:29:18 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 98848 invoked by uid 99); 28 Feb 2005 07:29:18 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from mail.tanto.de (HELO mail.tanto.de) (213.61.178.43) by apache.org (qpsmtpd/0.28) with ESMTP; Sun, 27 Feb 2005 23:29:17 -0800 Received: from localhost (localhost [127.0.0.1]) by mail.tanto.de (Postfix) with ESMTP id 7F60823CE0 for ; Mon, 28 Feb 2005 08:30:05 +0100 (CET) Received: from mail.tanto.de ([127.0.0.1]) by localhost (mail.tanto.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 18314-05 for ; Mon, 28 Feb 2005 08:30:05 +0100 (CET) Received: from tucholsky.office.tanto.de (tucholsky.office.tanto.de [10.0.0.35]) by mail.tanto.de (Postfix) with ESMTP id 4DD9123BF5 for ; Mon, 28 Feb 2005 08:30:05 +0100 (CET) Received: from tucholsky.office.tanto.de (morus@localhost [127.0.0.1]) by tucholsky.office.tanto.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id j1S7UM72008126 for ; Mon, 28 Feb 2005 08:30:22 +0100 Received: (from morus@localhost) by tucholsky.office.tanto.de (8.12.3/8.12.3/Debian-7.1) id j1S7ULCr008122; Mon, 28 Feb 2005 08:30:21 +0100 From: Morus Walter MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16930.51341.630394.350033@tanto-xipolis.de> Date: Mon, 28 Feb 2005 08:30:21 +0100 To: "Lucene Users List" Subject: Re: Search performance with one index vs. many indexes In-Reply-To: <421F8DB6.3030306@jCatalog.com> References: <421F8DB6.3030306@jCatalog.com> X-Mailer: VM 7.03 under 21.4 (patch 6) "Common Lisp" XEmacs Lucid X-Virus-Scanned: by amavisd-new at mail.tanto.de Q&A postmaster@tanto.de X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Jochen Franke writes: > Topic: Search performance with large numbers of indexes vs. one large index > > > My questions are: > > - Is the size of the "wordlist" the problem? > - Would we be a lot faster, when we have a smaller number > of files per index? sure. Look: Index lookup of a word is O(ln(n)) where n is the number of words. Index lookup of a word in k indexes having m words is O( k ln(m) ) In the best case all word lists are distict (purely theoretical), that is n = k*m or m = n/k For n = 15 Mio, k = 800 ln(n) = 16.5 k*ln(n/k) = 7871 In a realistic case, m is much bigger since word lists won't be distinct. But it's the linear factor k that bites you. In the worst case (all words in all indices) you have k*ln(n) = 13218.8 HTH Morus --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org