lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: term vectors
Date Wed, 15 Nov 2006 22:00:06 GMT
Can this be done as an offline task?

On Nov 15, 2006, at 1:07 PM, Phil Rosen wrote:

> Thanks for your help!
>
> Here is an example, I have 100 items, each with a set of  
> potentially unique
> attributes. Attributes could be color, size, length, density, etc.  
> So an
> example document could be:
>
> Id: 1
> ItemType: foo
> Blob-field: all sorts of text handled normally
> Outer-Color: Red
> Size: Large
> Temperature: hot
> Etc:
> Etc:
>
> I would like to get the sum of frequency counts for each term in  
> the fields
> I specify across the search results. I can just iterate through the
> documents and use getTermFreqVector() for each desired field on each
> document, then sum that; but this seems slow to me.
>
>
>
>
> -----Original Message-----
> From: Michael D. Curtin [mailto:mike@curtin.com]
> Sent: Wednesday, November 15, 2006 11:35 AM
> To: java-user@lucene.apache.org
> Subject: Re: term vectors
>
> Phil Rosen wrote:
>
>> I am building an application that requires I index a set of  
>> documents on
>> the scale of hundreds of thousands.
>>
>> A document can have a varying number of attribute fields with an  
>> unknown
>> set of potential values. I realize that just indexing a blob of  
>> fields
>> would be much faster, however I need to bin the search results  
>> based on
>> common attributes; as different types of attributes could  
>> potentially have
>> overlapping values a single blob for all attributes wont work.
>>
>> My question is this, is there a way to get term frequencies for a  
>> set of
>> documents or hits, without using getTermFreqVector() on each  
>> document and
>> each attribute field? As I could have hundreds of results, each with
>> dozens of attribute fields, looping getTermFreqVector() would be very
>> slow. If there isn't something inherent to lucene, has anyone seen an
>> extension that could accomplish this?
>
> Could you give an example of what you're starting with, what a  
> search looks
> like, and what you want out?  It sounds almost like you're looking  
> for a
> custom statistical analysis of hits, which I doubt Lucene is going  
> to have
> for
> you, out of the box ...
>
> --MDC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message