lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Rosen <pro...@optaros.com>
Subject RE: term vectors
Date Wed, 15 Nov 2006 18:07:20 GMT
Thanks for your help!

Here is an example, I have 100 items, each with a set of potentially unique 
attributes. Attributes could be color, size, length, density, etc. So an 
example document could be:

Id: 1
ItemType: foo
Blob-field: all sorts of text handled normally
Outer-Color: Red
Size: Large
Temperature: hot
Etc:
Etc:

I would like to get the sum of frequency counts for each term in the fields 
I specify across the search results. I can just iterate through the 
documents and use getTermFreqVector() for each desired field on each 
document, then sum that; but this seems slow to me.




-----Original Message-----
From: Michael D. Curtin [mailto:mike@curtin.com]
Sent: Wednesday, November 15, 2006 11:35 AM
To: java-user@lucene.apache.org
Subject: Re: term vectors

Phil Rosen wrote:

> I am building an application that requires I index a set of documents on
> the scale of hundreds of thousands.
>
> A document can have a varying number of attribute fields with an unknown
> set of potential values. I realize that just indexing a blob of fields
> would be much faster, however I need to bin the search results based on
> common attributes; as different types of attributes could potentially have
> overlapping values a single blob for all attributes wont work.
>
> My question is this, is there a way to get term frequencies for a set of
> documents or hits, without using getTermFreqVector() on each document and
> each attribute field? As I could have hundreds of results, each with
> dozens of attribute fields, looping getTermFreqVector() would be very
> slow. If there isn't something inherent to lucene, has anyone seen an
> extension that could accomplish this?

Could you give an example of what you're starting with, what a search looks
like, and what you want out?  It sounds almost like you're looking for a
custom statistical analysis of hits, which I doubt Lucene is going to have 
for
you, out of the box ...

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message