lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: categorized search
Date Wed, 11 May 2005 21:03:03 GMT

well ... once you have the list of all "category" names that are in docs
which match your orriginal query, you can either redo the orriginal query
with "and category:XXXX" to get the counts, or you can pre-compute (and
save) a BitSet for each category in your index (esay to build using a
HitCollector or a Filter), and find the cardinality of the intersection of
each of those BitSets with a BitSet from your search (again: using a
HitCollector on your orriginal query)

for the record: this is not a trivial task.  i've describe the bare basics
of the issue ... but there's a lot of processing going on to get these
kinds of numbers.

if you search the list for "category" and "count" you'll find this has
come up at least one other time in the last few months.


: Date: Thu, 5 May 2005 20:37:19 +0200
: From: Pablo Gomes Ludermir <gomesp@gmail.com>
: Reply-To: java-user@lucene.apache.org,
:      Pablo Gomes Ludermir <gomesp@gmail.com>
: To: java-user@lucene.apache.org
: Subject: Re: categorized search
:
: Chris,
:
: That was partially what I needed. You got it right when I said I
: needed the number of categories that I particular term appears (and it
: works).
: But, I also would like to know in how many documents in each category
: that term appears.
:
: For instance: title:lucene appears in the category "search engines"
: and "open source software", and it appears in the documents 1, 2 and 3
: in the category "search engines" and in documents 4 and 7 in the
: categoy "open source". I could not get it to work yet (maybe because
: of my lack of experience with Lucene).
: Someone could give me a hand???
: Thanks
: Pablo
:
: On 4/24/05, Chris Hostetter <hossman_lucene@fucit.org> wrote:
: >
: > : >I have indexed a field that describes the "category" of the document.
: > : >Thus, I want to know how many categories have a specific term. Could
: > : >someone help me to get this with good performance?
: >
: > I think I'm reading this question different than Chuck, so I'll toss out
: > somethign totally different...
: >
: > as I understand it, you've indexed a bunch of documents, with a variety of
: > fields, one of which is "category" (for example, maybe you are indexing
: > news articles, that each have a "title", "description", "url", and
: > "category").  Now you have a term like "title:lucene" (or
: > "description:pope") and you want to know the number of unique terms in the
: > category field that exist in articles that contain your input term.
: >
: > If that's what you're looking for, then you can problem achieve this by:
: >   1) make a TermQuery for your input term (ie: "title:lucene")
: >   2) put that TermQuery in a QueryFilter, and call bits(reader)
: >   3) call FieldCache.DEFAULT.getStrings(reader,"category")
: >   3) loop over the true bits in the BitSet from #3, and for each one, add
: >      the corrisponding entry from the String[] in #4 to a Set.
: >
: > when you're all done, the Set will be the list of categories, and the size
: > of that Set is the number (i think) you wanted.
: >
: > (DISCLAIMER: I've never acctaully used FieldCache, i'm just giving you my
: > advice based on reading the javadocs)
: >
: > -Hoss
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
: >
:
:
: --
: Pablo Gomes Ludermir
: gomesp@gmail.com
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message