Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 94419 invoked from network); 4 Aug 2009 11:22:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Aug 2009 11:22:25 -0000 Received: (qmail 51505 invoked by uid 500); 4 Aug 2009 11:22:28 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 51425 invoked by uid 500); 4 Aug 2009 11:22:28 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 51415 invoked by uid 99); 4 Aug 2009 11:22:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Aug 2009 11:22:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of prashullegaddi@gmail.com designates 209.85.198.239 as permitted sender) Received: from [209.85.198.239] (HELO rv-out-0506.google.com) (209.85.198.239) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Aug 2009 11:22:18 +0000 Received: by rv-out-0506.google.com with SMTP id b25so1199516rvf.5 for ; Tue, 04 Aug 2009 04:21:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=PdIWxiiupG2hiZuJGRaXj4B3iAZSMv6eYZQhSmze5Y0=; b=A9tU1skVH9X+6Bct9gGhYSCaCwzYdksv5VI5IqmBLj4vTq4Hl0kUJQDDVhu4B4yocL +tHH4YZt68TvmP2FCsQjGVFl2WMx5JDS+7wajiDCg44ocTeICdLm1g4cPbMGeypRWc3B mKGKwu5vCNJlYCIP5zMO0UqBZnWcjE4QJqCt0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=kf0FBmhE4PygPTES24x8qXj/Czhwo96RirqXxr08IYg+svvC0qAbt0LWALbYrfIxdX uJeZeFs8pqhRERxBg7OlhHVgLsPkH2EciRdsqwTS+eL4DJbEtDirS7qJxuXISUOHdtW5 q1R89UNG55CUYNS0Lq/3pnNbi2VecomwjHYLM= MIME-Version: 1.0 Received: by 10.141.28.19 with SMTP id f19mr5117551rvj.67.1249384918379; Tue, 04 Aug 2009 04:21:58 -0700 (PDT) In-Reply-To: <4d19a3630908040101l5a89bff2g1f900ea1f515c4eb@mail.gmail.com> References: <810678.30380.qm@web50304.mail.re2.yahoo.com> <867513fe0908032218r552b34f1vf4c0431b17de0571@mail.gmail.com> <4d19a3630908040101l5a89bff2g1f900ea1f515c4eb@mail.gmail.com> Date: Tue, 4 Aug 2009 16:51:58 +0530 Message-ID: Subject: Re: How to improve search time? From: prashant ullegaddi To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd1a9063f6e5f04704f182d X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd1a9063f6e5f04704f182d Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Shahi, Our queries are free text queries. But they will be expanded into: Multifield, Boolean. We are also expanding the original query using SynExpand of lucene. A simple query gets expanded to say a query of page size. And we are not storing any other fields except key (document IDs), target URLs and titles. Prashant. On Tue, Aug 4, 2009 at 1:31 PM, Shashi Kant wrote: > Prashant, I have had better luck with even larger sized indices on > similar platforms. Could you elaborate what types of queries you are > running, Multifield? Boolean? combinations? etc. Also you might want > to remove unnecessary stored fields from the index and move them to a > relational db to squeeze out better performance. > > > Shashi > > > On Tue, Aug 4, 2009 at 3:18 AM, prashant > ullegaddi wrote: > > I did that as well. Actually, we had 32 indexes initially. We searched > them. > > It was even horrible. > > After that I merged them into 4 indexes. And did the same. No gain! > > > > Then, I had to merge 32 indexes into one. > > > > On Tue, Aug 4, 2009 at 10:48 AM, Anshum wrote: > > > >> Hi Prashant, > >> 8 seconds as the minimum time is a little too much, though considering > >> you're using just 4G of RAM its still ok. > >> I would advice you to break your index into smaller indexes, perhaps > >> selectively query the indexes (if that's possible for your application) > and > >> use a parallelmultisearcher. Its just something that you might try and > >> like. > >> All said and done, parallelizing would only get you a bell-curve like > >> performance graph, so you'd have to figure out the sweet spot there. > >> > >> -- > >> Anshum Gupta > >> Naukri Labs! > >> http://ai-cafe.blogspot.com > >> > >> The facts expressed here belong to everybody, the opinions to me. The > >> distinction is yours to draw............ > >> > >> > >> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi < > >> prashullegaddi@gmail.com> wrote: > >> > >> > I'm running it on Quadcore, 2.4GHz each, 4GB RAM. > >> > > >> > Prashant. > >> > > >> > On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic < > >> > otis_gospodnetic@yahoo.com > >> > > wrote: > >> > > >> > > With such a large index be prepared to put it on a server with lots > of > >> > RAM > >> > > (even if you follow all the tips from the Wiki). > >> > > When reporting performance numbers, you really ought to tell us > about > >> > your > >> > > hardware, types of queries, etc. > >> > > > >> > > Otis > >> > > -- > >> > > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > >> > > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > >> > > > >> > > > >> > > > >> > > ----- Original Message ---- > >> > > > From: prashant ullegaddi > >> > > > To: java-user@lucene.apache.org > >> > > > Sent: Monday, August 3, 2009 12:33:46 AM > >> > > > Subject: How to improve search time? > >> > > > > >> > > > Hi, > >> > > > > >> > > > I've a single index of size 87GB containing around 50M documents. > >> When > >> > I > >> > > > search for any query, > >> > > > best search time I observed was 8sec. And when query is expanded > with > >> > > > synonyms, search takes > >> > > > minutes (~ 2-3min). Is there a better way to search so that > overall > >> > > search > >> > > > time reduces? > >> > > > > >> > > > Thanks, > >> > > > Prashant. > >> > > > >> > > > >> > > > --------------------------------------------------------------------- > >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> > > For additional commands, e-mail: java-user-help@lucene.apache.org > >> > > > >> > > > >> > > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --000e0cd1a9063f6e5f04704f182d--