Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5575D11B61 for ; Fri, 20 Jun 2014 07:49:39 +0000 (UTC) Received: (qmail 72321 invoked by uid 500); 20 Jun 2014 07:49:37 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 72257 invoked by uid 500); 20 Jun 2014 07:49:37 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 72245 invoked by uid 99); 20 Jun 2014 07:49:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2014 07:49:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jamie@mailarchiva.com designates 82.145.44.153 as permitted sender) Received: from [82.145.44.153] (HELO glonass.stimulussoft.com) (82.145.44.153) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2014 07:49:30 +0000 Received: by glonass.stimulussoft.com (Postfix, from userid 5001) id 5F927802C9C; Fri, 20 Jun 2014 08:46:16 +0100 (BST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on glonass X-Spam-Level: Received: from jamiemacbook.local (unknown [197.96.55.228]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by glonass.stimulussoft.com (Postfix) with ESMTPSA id 6D0C98029A0 for ; Fri, 20 Jun 2014 08:46:15 +0100 (BST) Message-ID: <53A3E7FE.40603@mailarchiva.com> Date: Fri, 20 Jun 2014 09:51:26 +0200 From: Jamie User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: search performance References: <01AFE0FB733B9944974A82A09CEB7A0309C81ABB21@mail3.imedx.com> <538C1F0F.3010300@mailarchiva.com> <538C5174.7040101@mailarchiva.com> <538D962E.2020809@mailarchiva.com> <538D99A4.1060605@mailarchiva.com> <538D9BAC.7030501@mailarchiva.com> <538DA1A3.4040302@mailarchiva.com> <538DA6AE.4030402@mailarchiva.com> <538DB0C4.5050701@mailarchiva.com> <538DC3ED.4030407@mailarchiva.com> <53923B1C.6090200@mailarchiva.com> In-Reply-To: <53923B1C.6090200@mailarchiva.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.2 Hi All Thank you for all your suggestions. Some of the recommendations hadn't yet been implemented, as our code base was using older versions of Lucene with reduced capabilities. Thus, far, all the recommendations for fast search have been implemented (e.g. using pagination with searchAfter, DirectoryReader.openIfChanged, avoiding wrapping lucene scoreDoc results, option to disable sorting, etc.). While, in some environments, search performance has improved significantly, in other larger ones we are unfortunately, still seeing 1 minute - 5 minute search times. For instance, in one site, the total index size is 500GB with 190 million documents indexed. They are running a machine with 24 core and 4 SSD drives to house the indexes. New emails are being added to the indexes at a rate of 10 message/sec. One area possible area for improvement: Searching is being conducted across several indexes. To accomplish this, on each search, a MultiReader is constructed, that consists of several subreaders created by the DirectoryReader.openIfChangedMethod. Only one of the indexes is updated frequently, the others are never updated. For each search, a new IndexSearcher is created passed the MultiReader in the constructor. From what I've read, MultiReader and IndexSearcher are relatively lightweight and should not impact search performance. Is this correct? Is there a faster way to handle searching across multiple indexes? What is the performance impact of searching across multiple indexes? Am I correct that using SearchManager can't be used with a MultiReader and NRT? I would appreciate all suggestions on how to optimize our search performance further. Search time has become a usability issue. Much appreciate Jamie --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org