<summary> <contents> > > where itemID is just a number (primary key in the DB) > tags : text > titie: text > summary: text > contents: Huge text (text extracted from files: pdfs, docs etc). > > Now while running a search query I realized that the response time > increases in a linear fashion as the number of <itemID> increase in the DB. > > If I have 50 items, its 8 seconds > 100 items, its 17 seconds. > 300+ items, its 60 seconds and maybe more. > > In a perfect world, I'd like to search on 300+ items within 10-15 seconds. > Can anyone give me tips to fine tune lucene ? > > Heres a code snippet: > > sql query = "SELECT itemID from items where creator = 'askar' ; > > --execute query-- > > while(rs.next()){ > > score = doTagSearch(askar,text,itemID); > scoreTitle = doTitleSearch(askar,text,itemID); > scoreSummary = doSummarySearch(askar,text,itemID); > > ---- > > } > > So this code asks Lucene to search for the "text" in the itemID passed. > itemID is already indexed. The while loop will run 300 times if there are > 300 items....that gets slow...what can I do here ?? > > thanks for the replies, > > AZ > ------=_Part_37553

Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 12603 invoked from network); 24 Jul 2007 20:32:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Jul 2007 20:32:06 -0000 Received: (qmail 92949 invoked by uid 500); 24 Jul 2007 20:31:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 92770 invoked by uid 500); 24 Jul 2007 20:31:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 92251 invoked by uid 99); 24 Jul 2007 20:31:54 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2007 13:31:54 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of askar.zaidi@gmail.com designates 66.249.92.174 as permitted sender) Received: from [66.249.92.174] (HELO ug-out-1314.google.com) (66.249.92.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2007 13:31:51 -0700 Received: by ug-out-1314.google.com with SMTP id c2so270549ugf for ; Tue, 24 Jul 2007 13:31:30 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=GSrcUR9U9nv4fZ6tLRPzcbJ6iUew5v7G24JeccmFrAzLt24/LREOsbLzP5ni6U70EkUwbrpT/qPU6D0HkmmyrkxAIFT8Dm2vuxqGm6Fi+JHoP2CGwBCVxDVkv0UXDv8pVyRuoMYGd2YnJyqwEqdfAfk72Xpbl4DA9SsjYIjCpFg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=C6bLQMypUZQ9cvtgDj85s8L5Dd+6WqNot3+FbH/mIzroAiw9SEuWiP7/lgeNzTQAGqVJ37TcLhVxuW9Q4jG0t//X+s5ZpMh9YpQ6iUS5PKPFFkvZz0uhCIFc0c+TjqEpEEmzysL0Yf+5szjwEtT/dpsjO7n+VCIMq7tyfdpPZL8= Received: by 10.66.222.19 with SMTP id u19mr836690ugg.1185309090091; Tue, 24 Jul 2007 13:31:30 -0700 (PDT) Received: by 10.66.251.15 with HTTP; Tue, 24 Jul 2007 13:31:30 -0700 (PDT) Message-ID: Date: Tue, 24 Jul 2007 16:31:30 -0400 From: "Askar Zaidi" To: java-user@lucene.apache.org Subject: Re: Fine Tuning Lucene implementation In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_37553_6225040.1185309090048" References: X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_37553_6225040.1185309090048 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I have 512MB RAM allocated to JVM Heap. If I double my system RAM from 768MB to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker results ? Can I expect results which take 1 minute to be returned in 30 seconds with more RAM ? Should I also get a more powerful CPU ? A real server class machine ? I have also done some of the optimizations that are mentioned on the Lucene website. thanks, AZ On 7/24/07, Askar Zaidi wrote: > > Hey Guys, > > I just finished up using Lucene in my application. I have data in a > database , so while indexing I extract this data from the database and pump > it into the index. Specifically , I have the following data in the index: > > <summary> <contents> > > where itemID is just a number (primary key in the DB) > tags : text > titie: text > summary: text > contents: Huge text (text extracted from files: pdfs, docs etc). > > Now while running a search query I realized that the response time > increases in a linear fashion as the number of <itemID> increase in the DB. > > If I have 50 items, its 8 seconds > 100 items, its 17 seconds. > 300+ items, its 60 seconds and maybe more. > > In a perfect world, I'd like to search on 300+ items within 10-15 seconds. > Can anyone give me tips to fine tune lucene ? > > Heres a code snippet: > > sql query = "SELECT itemID from items where creator = 'askar' ; > > --execute query-- > > while(rs.next()){ > > score = doTagSearch(askar,text,itemID); > scoreTitle = doTitleSearch(askar,text,itemID); > scoreSummary = doSummarySearch(askar,text,itemID); > > ---- > > } > > So this code asks Lucene to search for the "text" in the itemID passed. > itemID is already indexed. The while loop will run 300 times if there are > 300 items....that gets slow...what can I do here ?? > > thanks for the replies, > > AZ > ------=_Part_37553_6225040.1185309090048--