Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9FFF31155E for ; Mon, 19 May 2014 09:55:15 +0000 (UTC) Received: (qmail 1316 invoked by uid 500); 19 May 2014 09:55:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 1256 invoked by uid 500); 19 May 2014 09:55:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 1247 invoked by uid 99); 19 May 2014 09:55:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 09:55:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Alessandro.DeSimone@bvdinfo.com designates 193.194.158.169 as permitted sender) Received: from [193.194.158.169] (HELO mail-ca-a.bvdep.net) (193.194.158.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 09:55:11 +0000 Received: from MAIL-MBX-A.bvdep.net ([169.254.1.40]) by mail-ca-b.bvdep.net ([193.194.158.169]) with mapi id 14.03.0169.001; Mon, 19 May 2014 11:54:48 +0200 From: "De Simone, Alessandro" To: "java-user@lucene.apache.org" Subject: RE: search time & number of segments Thread-Topic: search time & number of segments Thread-Index: Ac9w8IeQzLUz+pMPRLGn+b9tV/wTcQBBxBy2AFQd4zA= Date: Mon, 19 May 2014 09:54:46 +0000 Message-ID: References: <2E6A89A648463A4EBF093A9062C1668304EE1C576ECC@SBMAILBOX1.sb.statsbiblioteket.dk> In-Reply-To: <2E6A89A648463A4EBF093A9062C1668304EE1C576ECC@SBMAILBOX1.sb.statsbiblioteket.dk> Accept-Language: fr-BE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.28.70.31] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Thank you for your input > How much RAM does your search machine have? We have 16GB of ram, and there is at least 8GB free memory for the OS file = cache. The cache is working pretty well. > That sounds right. Although each segment is 1/16 of the full index size, = the number of seeks per segment is not 1/16: Larger indexes require relatively fewer seeks. Think b= inary search and log(values_in_field), although that is highly simplified. The "IO calls" I was referring to is the number of time the "BufferedIndexI= nput.refill()" function is called. So it means that we have 16 times more b= ytes read when there are 16 segments for the exact same result.=20 I would have agreed to blame seeks if Lucene was reading more or less the s= ame number of bytes but with worse performance. In fact, that's exactly wha= t I was expecting. But this is not the case here.=20 It's almost as if extracting the terms stats (or whatever metadata the segm= ent has) is more costly than the search itself. And I'm not talking about q= ueries with few results.=20 > I am guessing that you are using spinning drives and that there is not mu= ch RAM in the machine?=20 As you can see we have a lot of RAM. Using the resource manager I see that = nothing is trashing the system or swapping to disk. Lucene is just a lot sl= ower for every query. When the query is in the OS cache, the call takes a f= ew milisecs as expected. Alessandro De Simone -----Original Message----- From: Toke Eskildsen [mailto:te@statsbiblioteket.dk]=20 Sent: samedi 17 mai 2014 20:04 To: java-user@lucene.apache.org Subject: RE: search time & number of segments De Simone, Alessandro [Alessandro.DeSimone@bvdinfo.com] wrote: > We have a performance issue ever since we stopped optimizing the index. W= e are using Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on W= indows 2008R2. How much RAM does your search machine have? > For instance, a search with (2 termQuery + 1 spanquery) x 6 fields made 1= 43 IO calls. Now with 16 segments we have 2432 IO calls and the search tim= e is really bad. [...] That sounds right. Although each segment is 1/16 of the full index size, th= e number of seeks per segment is not 1/16: Larger indexes require relativel= y fewer seeks. Think binary search and log(values_in_field), although that = is highly simplified. > The size of the Index is ~24gb (14millions documents). No field are store= d, only indexed. Normally the penalty of running un-optimized is not that great, so it sound= s like your machine cannot provide the I/O speed it needs (as opposed to ha= ving a great logistics overhead from the multiple segments). I am guessing = that you are using spinning drives and that there is not much RAM in the ma= chine? The easy solution is either to throw RAM at the problem or switch to= SSD. - Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org