lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Rosen <>
Subject RE: term vectors
Date Wed, 15 Nov 2006 18:07:20 GMT
Thanks for your help!

Here is an example, I have 100 items, each with a set of potentially unique 
attributes. Attributes could be color, size, length, density, etc. So an 
example document could be:

Id: 1
ItemType: foo
Blob-field: all sorts of text handled normally
Outer-Color: Red
Size: Large
Temperature: hot

I would like to get the sum of frequency counts for each term in the fields 
I specify across the search results. I can just iterate through the 
documents and use getTermFreqVector() for each desired field on each 
document, then sum that; but this seems slow to me.

-----Original Message-----
From: Michael D. Curtin []
Sent: Wednesday, November 15, 2006 11:35 AM
Subject: Re: term vectors

Phil Rosen wrote:

> I am building an application that requires I index a set of documents on
> the scale of hundreds of thousands.
> A document can have a varying number of attribute fields with an unknown
> set of potential values. I realize that just indexing a blob of fields
> would be much faster, however I need to bin the search results based on
> common attributes; as different types of attributes could potentially have
> overlapping values a single blob for all attributes wont work.
> My question is this, is there a way to get term frequencies for a set of
> documents or hits, without using getTermFreqVector() on each document and
> each attribute field? As I could have hundreds of results, each with
> dozens of attribute fields, looping getTermFreqVector() would be very
> slow. If there isn't something inherent to lucene, has anyone seen an
> extension that could accomplish this?

Could you give an example of what you're starting with, what a search looks
like, and what you want out?  It sounds almost like you're looking for a
custom statistical analysis of hits, which I doubt Lucene is going to have 
you, out of the box ...


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message