lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Search Results Clustering
Date Wed, 24 Aug 2005 23:22:04 GMT

the approach(es) I described in this thread...

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/%3cPine.LNX.4.58.0505111358460.9671@hal.rescomp.berkeley.edu%3e

...should work, but you have the added complexity of whating the counts
not just for all unique values in a field, but all the permutations of
values from two fields -- which just means you need to compute a lot more
intersections.  Fortunately, Filters cache nicely so you only have to pay
the cost of computing them when your index changes.

Having done an *extensive* amount of work along this line recently, I can
tell you that something worth considering is to only precompute the
Filters for the individual terms in each field, and then find the
intersection of all the permutations at search time -- it reduces the
amount of precomputation needed, and in many cases can make a huge
difference in the amount of RAM needed to cache all of the Filters.


: Date: Wed, 24 Aug 2005 00:46:46 -0700 (PDT)
: From: "kapilChhabra (sent by Nabble.com)" <lists@nabble.com>
: Reply-To: java-user@lucene.apache.org,
:      kapilChhabra <kapil.chhabra@naukri.com>
: To: java-user@lucene.apache.org
: Subject: Search Results Clustering
:
:
: Hi All,
: I have been using Lucene in my application to search over 4 million recordes updated daily.
: I am currently using a single index with 21 fields.
: Some of my fields contain numbers that are foreign keys to my data. I have provided a dropdown
of values to select from, on my search form, to search on these fields.
:
: A typical scenario of my index/search  is:
: FIELD-1:token;index - formatted text using WhitespaceAnalyzer
: FIELD-2:token;index - formatted text using WhitespaceAnalyzer
: FIELD-3:ndex - integers[foreign keys] stored as string.
: FIELD-4:ndex - integers[foreign keys] stored as string.
:
: Sample Search Query:
:
: (FIELD-1:apple OR FILED-1:orange) AND (FIELD-3:4 OR FILED-3:5)
:
: The results are Sorted on FILED-4.
:
: I am getting results as expected.
:
: An additional requirement is to bunch the search results display the count.
: eg. ouput:
: Search Results:
: 1. Doc 100
: 2. Doc 209
: 3. Doc 897
: etc...
:
: Search Clusters:
: Total Results = 540
: +results in [FILED-3:4] = 400
: ---results in [FILED-4:1] = 150
: ---results in [FILED-4:7] = 130
: ---results in [FILED-4:3] = 100
: ---results in [FILED-4:others] = 20
:
: +results in [FILED-3:5] = 140
: ---results in [FILED-4:2] = 90
: ---results in [FILED-4:1] = 30
: ---results in [FILED-4:others] = 20
:
: I have no clue how to do it using a single index.
: Any pointers in this will be highly appreciated.
:
: Thanks in advance,
:
: Regards,
: KapilChhabra
: --
: Sent from the Lucene - Java Users forum at Nabble.com:
: http://www.nabble.com/Search-Results-Clustering-t249355.html#a696937
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message