Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 076E411F13 for ; Mon, 19 May 2014 20:28:29 +0000 (UTC) Received: (qmail 37497 invoked by uid 500); 19 May 2014 20:28:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 37438 invoked by uid 500); 19 May 2014 20:28:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 37430 invoked by uid 99); 19 May 2014 20:28:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 20:28:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of SRS0=PkZHx3=2R=basetechnology.com=jack@yourhostingaccount.com designates 65.254.253.237 as permitted sender) Received: from [65.254.253.237] (HELO walmailout08.yourhostingaccount.com) (65.254.253.237) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 20:28:20 +0000 Received: from mailscan12.yourhostingaccount.com ([10.1.15.12] helo=walmailscan12.yourhostingaccount.com) by walmailout08.yourhostingaccount.com with esmtp (Exim) id 1WmUAB-0003MI-Th for java-user@lucene.apache.org; Mon, 19 May 2014 16:27:59 -0400 Received: from impout01.yourhostingaccount.com ([10.1.55.1] helo=impout01.yourhostingaccount.com) by walmailscan12.yourhostingaccount.com with esmtp (Exim) id 1WmUAB-0006Ru-SF for java-user@lucene.apache.org; Mon, 19 May 2014 16:27:59 -0400 Received: from walauthsmtp04.yourhostingaccount.com ([10.1.18.4]) by impout01.yourhostingaccount.com with NO UCE id 3wTz1o00B05G96J01wTzhU; Mon, 19 May 2014 16:27:59 -0400 X-Authority-Analysis: v=2.0 cv=Cv89gwED c=1 sm=1 a=UkMH5KcvGpXfM81wB0t8ug==:17 a=aQzbgH187woA:10 a=75jSLWQJmvsA:10 a=3jZET7lWBKwA:10 a=IkcTkHD0fZMA:10 a=jvYhGVW7AAAA:8 a=mV9VRH-2AAAA:8 a=oeWacTI0knF-Dwjy7nkA:9 a=QEXdDO2ut3YA:10 a=837mEr_yPfSM338D:21 a=TKkcAZ-ClXilN6d1:21 a=ZyCNx9LFiA0kwLx3ZJIN5w==:117 X-EN-OrigOutIP: 10.1.18.4 X-EN-IMPSID: 3wTz1o00B05G96J01wTzhU Received: from 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.113.28]:38831 helo=JackKrupansky14) by walauthsmtp04.yourhostingaccount.com with esmtpa (Exim) id 1WmUAB-0001A7-OC for java-user@lucene.apache.org; Mon, 19 May 2014 16:27:59 -0400 Message-ID: <9E55ABF061FB4DB98754FDAA34790C5E@JackKrupansky14> From: "Jack Krupansky" To: References: In-Reply-To: Subject: Re: Performance issue when using multiple PhraseQueries against a 1+ million entries index Date: Mon, 19 May 2014 16:28:00 -0400 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:931c98230c6409dcc37fa7e93b490c27 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.113.28 X-EN-OrigHost: 207-237-113-28.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org Does your index fit fully in system memory - the OS file cache? If not, there could be a lot of thrashing (I/O) as Lucene accesses the index. -- Jack Krupansky -----Original Message----- From: Liviu Matei Sent: Monday, May 19, 2014 4:21 PM To: java-user@lucene.apache.org Subject: Performance issue when using multiple PhraseQueries against a 1+ million entries index Hi, In order to achieve a somehow "smarter" search that takes into consideration also the context I decided to use PhraseQuery. Now I create ~100 phrase queries from the input text and combine them with boolean query into one query and issue a search against the index. Now if the index size is big (1+ million entries with a lot of content) I am encountering performance hits - reponse time ~30 seconds which is not acceptable. Can you please tell me if there is a way to tune the PhraseQueries ? Or is it another way to improve perfomance besides reducing the number of queries, I've read a little about N-Gram query but not sure if it is suitable in this scenario ? Thanks and regards, Liviu --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org