Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 16884 invoked from network); 22 Jan 2007 11:35:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Jan 2007 11:35:15 -0000 Received: (qmail 77396 invoked by uid 500); 22 Jan 2007 11:35:13 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 77370 invoked by uid 500); 22 Jan 2007 11:35:13 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 77359 invoked by uid 99); 22 Jan 2007 11:35:13 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jan 2007 03:35:13 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of somnath.banerjee@gmail.com designates 209.85.132.244 as permitted sender) Received: from [209.85.132.244] (HELO an-out-0708.google.com) (209.85.132.244) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jan 2007 03:35:04 -0800 Received: by an-out-0708.google.com with SMTP id c3so343056ana for ; Mon, 22 Jan 2007 03:34:43 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=fUfomjcMLeiHercle2Yef4wt9BbTnO+e6R/jfhnRzL290H8qbQEx25aRCFyxKR9qBiDRR19perkMVZhSs3o5uJoy5oyObKJ6qxEdn69a9ActIZpNG2//ff6r1BwDhZHjQ4NTxEkjW/ToQfUX8Y8P9NIbBq/SgBSQ0osJ+C8ZUyw= Received: by 10.49.13.14 with SMTP id q14mr6369360nfi.1169465682305; Mon, 22 Jan 2007 03:34:42 -0800 (PST) Received: by 10.78.90.8 with HTTP; Mon, 22 Jan 2007 03:34:42 -0800 (PST) Message-ID: <17e2e1b50701220334g4fdf6497h311e5844abb5f6f6@mail.gmail.com> Date: Mon, 22 Jan 2007 17:04:42 +0530 From: "Somnath Banerjee" To: java-user@lucene.apache.org Subject: Long Query Performance MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_110361_32950996.1169465682257" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_110361_32950996.1169465682257 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi All, I have created a 8GB index of almost 2 million documents. My requirement is to run nearly 0.72 million query on this index. Each query consists of 200 - 400 words. I have created a Boolean Query by ORing these words. But each query is taking nearly 5 - 10 seconds to execute ( 2.78 GHz, 1.5 GB RAM). That's mean the entire batch of 0.72M query will take more than 70 days to execute. Is it expected or there is a way to improve the performance? From earlier posts I gathered that complex query is expected to take more time (this much???). I have tried some of the improvements mentioned in other posts (e.g. increasing JVM heap space) without much benefit. Please let me know if you can think of any optimization technique given that my requirement is to execute all those queries in a batch run (additional hardware is not an option for me). Also, I just need top 150-200 results for each query. Can that be used to speed up the process? In case I'm doing something wrong I have mentioned below the way I'm constructing the query and few lines of logs IndexSearcher sh = new IndexSearcher("IndexPath"); for each query { BooleanQuery bq = new BooleanQuery(); For each word in the query text { bq.add(new TermQuery(new Term("text", tktext)), BooleanClause.Occur.SHOULD); } sh.search(bq); } sh.close(); Performance Log (Query Length = No. Of Words; Time= Millisecond) Query Length: 332 Time Taken: 8609 Query Length: 276 Time Taken: 5172 Query Length: 345 Time Taken: 9313 Thanks in advance, Somnath ------=_Part_110361_32950996.1169465682257--