Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 26966 invoked from network); 24 Jul 2007 21:29:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Jul 2007 21:29:05 -0000 Received: (qmail 63574 invoked by uid 500); 24 Jul 2007 21:29:00 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 63544 invoked by uid 500); 24 Jul 2007 21:29:00 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 63533 invoked by uid 99); 24 Jul 2007 21:29:00 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2007 14:29:00 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of askar.zaidi@gmail.com designates 66.249.92.174 as permitted sender) Received: from [66.249.92.174] (HELO ug-out-1314.google.com) (66.249.92.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2007 14:28:58 -0700 Received: by ug-out-1314.google.com with SMTP id c2so280890ugf for ; Tue, 24 Jul 2007 14:28:36 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=IKsF9I7qpGob6Ll0AbEZC6QGEOoe+aDHST56UCKtENC7CrxIqKFeL8Ac4S7CUxn/cwRpKxP25gumDGNNru86uX08KIilX6y2eG67dgH+cBaBrFfxjY44FCmZzi8HtzLOO7em+6Ii5vmgIO3rhI7T53SzcV7y7vzn/a20aiv16sM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=dgOhZw+tQ5fB6UOwWasZHdwcW+L/kmYR5r/yeW1GXVAlafYDDriqUuPy84yDEkOGd+LFWEIxPkX9OCjyeHo7UpvZXnPcO+oUuUdSP20ZuQm2Q+n4LFdZ7Vj6eh2r+HjjKD1ApLmhkOceCevxMjFnqUBUMBafhXp1cnlMSx38+2w= Received: by 10.67.119.9 with SMTP id w9mr873569ugm.1185312516855; Tue, 24 Jul 2007 14:28:36 -0700 (PDT) Received: by 10.66.251.15 with HTTP; Tue, 24 Jul 2007 14:28:36 -0700 (PDT) Message-ID: Date: Tue, 24 Jul 2007 17:28:36 -0400 From: "Askar Zaidi" To: java-user@lucene.apache.org Subject: Re: Fine Tuning Lucene implementation In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_38297_20506843.1185312516806" References: X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_38297_20506843.1185312516806 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Thanks for the reply. I am timing the entire search process with a stop watch, a bit ghetto style. My getXXX methods are: Document doc = hits.doc(i); String str = doc.get("item"); So you can see that I am retrieving the entire document in a search query. Ideally , I'd like to just retrieve the Field object that I want to run the search on. I know this will give me a boost as one of my Fields is really huge. My query is selecting the entire user data-set in the database. I'd like to do some SQL based search in the query too so that I pick only those items where the phrase matches. Index contains about 650MB of data. Index file size is 14478869 bytes. thanks, AZ On 7/24/07, Grant Ingersoll wrote: > > Where are you getting your numbers from? That is, where are your > timers? Are you timing the rs.next() loop, or the individual calls > to Lucene? What do the getXXXXX methods look like? How big are your > queries? How big is your index? > > Essentially, we need more info to really help you. From what I can > tell, you are generating 3 different Lucene queries for each record > in the database. Frankly, I surprised your slowdown is only linear. > > On Jul 24, 2007, at 4:31 PM, Askar Zaidi wrote: > > > I have 512MB RAM allocated to JVM Heap. If I double my system RAM > > from 768MB > > to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker > > results > > ? > > > > Can I expect results which take 1 minute to be returned in 30 > > seconds with > > more RAM ? Should I also get a more powerful CPU ? A real server class > > machine ? > > > > I have also done some of the optimizations that are mentioned on > > the Lucene > > website. > > > > thanks, > > AZ > > > > On 7/24/07, Askar Zaidi wrote: > >> > >> Hey Guys, > >> > >> I just finished up using Lucene in my application. I have data in a > >> database , so while indexing I extract this data from the database > >> and pump > >> it into the index. Specifically , I have the following data in the > >> index: > >> > >> <summary> <contents> > >> > >> where itemID is just a number (primary key in the DB) > >> tags : text > >> titie: text > >> summary: text > >> contents: Huge text (text extracted from files: pdfs, docs etc). > >> > >> Now while running a search query I realized that the response time > >> increases in a linear fashion as the number of <itemID> increase > >> in the DB. > >> > >> If I have 50 items, its 8 seconds > >> 100 items, its 17 seconds. > >> 300+ items, its 60 seconds and maybe more. > >> > >> In a perfect world, I'd like to search on 300+ items within 10-15 > >> seconds. > >> Can anyone give me tips to fine tune lucene ? > >> > >> Heres a code snippet: > >> > >> sql query = "SELECT itemID from items where creator = 'askar' ; > >> > >> --execute query-- > >> > >> while(rs.next()){ > >> > >> score = doTagSearch(askar,text,itemID); > >> scoreTitle = doTitleSearch(askar,text,itemID); > >> scoreSummary = doSummarySearch(askar,text,itemID); > >> > >> ---- > >> > >> } > >> > >> So this code asks Lucene to search for the "text" in the itemID > >> passed. > >> itemID is already indexed. The while loop will run 300 times if > >> there are > >> 300 items....that gets slow...what can I do here ?? > >> > >> thanks for the replies, > >> > >> AZ > >> > > -------------------------- > Grant Ingersoll > Center for Natural Language Processing > http://www.cnlp.org/tech/lucene.asp > > Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_38297_20506843.1185312516806--