Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 21282 invoked from network); 11 Apr 2009 06:07:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Apr 2009 06:07:54 -0000 Received: (qmail 70410 invoked by uid 500); 11 Apr 2009 06:07:53 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 70322 invoked by uid 500); 11 Apr 2009 06:07:53 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 70314 invoked by uid 99); 11 Apr 2009 06:07:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Apr 2009 06:07:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of joaquin.delgado@gmail.com designates 74.125.46.28 as permitted sender) Received: from [74.125.46.28] (HELO yw-out-2324.google.com) (74.125.46.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Apr 2009 06:07:45 +0000 Received: by yw-out-2324.google.com with SMTP id 9so771008ywe.5 for ; Fri, 10 Apr 2009 23:07:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=gYVYMuwcV5es/ZveHn1y5tp0O8Jz886KLhuIOdRvxuk=; b=imqjDtlPohJrBd0yJeafDcSACWrD/BqNNIu5RuC0tRRP5Sygnro37jGFrQhQG9zKlR 7DJ5FtUZ7J2jtYaO7FKGcdNfK/uk1z3lxyceISpI5dO4j4d87mYkReWC10iMlP0Hxf7F LeYd67SR0uCX4MrrCLhFn0A/c1BtbYnmbAD68= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=FmENSC/HBgcJwFGk6E9pP6dNnaKwp7r0sR0KZZQ35PzyAlHZDz1GlfQ2J81jOZ5MF0 xoOPNXIsWUVdq4Ua/qhQN7HcUHqAF/uTjZ915IzffcmuCaWapVT0tEjFBkVQ7DB2QqOc ekPz61RS0kooVhObXsaOyWXvxROppQaKiBgXU= MIME-Version: 1.0 Received: by 10.90.105.17 with SMTP id d17mr5411063agc.68.1239430044891; Fri, 10 Apr 2009 23:07:24 -0700 (PDT) In-Reply-To: <22997958.post@talk.nabble.com> References: <22997958.post@talk.nabble.com> Date: Fri, 10 Apr 2009 23:07:24 -0700 Message-ID: Subject: Re: Grouping Lucene search results and calculating frequency by category From: "J. Delgado" To: java-dev@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e64f4a688cbc740467414bf7 X-Virus-Checked: Checked by ClamAV on apache.org --0016e64f4a688cbc740467414bf7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Have you looked at SOLR? http://lucene.apache.org/solr/ It pretty much has what you are looking for. -- Joaquin On Fri, Apr 10, 2009 at 9:39 PM, mitu2009 wrote: > > Am working on a store search API using Lucene. > > I need to show store search results for each City,State combination with > its > frequency in brackets....for example: > > Los Angles,CA (450) Atlant,GA (212) Boston, MA (78) . . . > > As of now, my search results return around 7000 lucene documents on an > average if the user says "Show me all the stores". In this use case, I end > up showing around 800 unique City,State records as shown above. > > Am overriding HitCollector class's Collect method and retrieving vectors as > follows: var vectors = _reader.GetTermFreqVectors(doc); Then I iterate > through this collection and calculate the frequency for each unique > City,State combination. > > But this is turning out to be very very slow in performance...is there any > better way of grouping search results and calculating frequency in Lucene? > Code snippet would be very helpful > > Also,please suggest me if i can optimize my Lucene search code using any > other techniques/tips.... > > Thanks for reading! > > -- > View this message in context: > http://www.nabble.com/Grouping-Lucene-search-results-and-calculating-frequency-by-category-tp22997958p22997958.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --0016e64f4a688cbc740467414bf7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Have you looked at SOLR?
http= ://lucene.apache.org/solr/

It pretty much has what you are looki= ng for.

-- Joaquin

On Fri, Apr 10,= 2009 at 9:39 PM, mitu2009 <musicfreaque@gmail.com> wrote:

Am working on a store search API using Lucene.

I need to show store search results for each City,State combination with it= s
frequency in brackets....for example:

Los Angles,CA (450) Atlant,GA (212) Boston, MA (78) . . .

As of now, my search results return around 7000 lucene documents on an
average if the user says "Show me all the stores". In this use ca= se, I end
up showing around 800 unique City,State records as shown above.

Am overriding HitCollector class's Collect method and retrieving vector= s as
follows: var vectors =3D _reader.GetTermFreqVectors(doc); Then I iterate through this collection and calculate the frequency for each unique
City,State combination.

But this is turning out to be very very slow in performance...is there any<= br> better way of grouping search results and calculating frequency in Lucene?<= br> Code snippet would be very helpful

Also,please suggest me if i can optimize my Lucene search code using any other techniques/tips....

Thanks for reading!

--
View this message in context: http://www.nabble.com/Grouping-Lucene-search-resu= lts-and-calculating-frequency-by-category-tp22997958p22997958.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


--0016e64f4a688cbc740467414bf7--