Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (herse.apache.org: local policy)
Message-ID: <455B41A8.7090209@curtin.com>
Date: Wed, 15 Nov 2006 11:34:48 -0500
From: "Michael D. Curtin" <mike@curtin.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
 rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)
MIME-Version: 1.0
To: java-user@lucene.apache.org
Subject: Re: term vectors
References: <002201c708d0$b700a100$b3640a0a@DJ8P5VB1PR>
In-Reply-To: <002201c708d0$b700a100$b3640a0a@DJ8P5VB1PR>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Phil Rosen wrote:

> I am building an application that requires I index a set of documents on
> the scale of hundreds of thousands.
> 
> A document can have a varying number of attribute fields with an unknown
> set of potential values. I realize that just indexing a blob of fields
> would be much faster, however I need to bin the search results based on
> common attributes; as different types of attributes could potentially have
> overlapping values a single blob for all attributes wont work.
> 
> My question is this, is there a way to get term frequencies for a set of
> documents or hits, without using getTermFreqVector() on each document and
> each attribute field? As I could have hundreds of results, each with
> dozens of attribute fields, looping getTermFreqVector() would be very
> slow. If there isn't something inherent to lucene, has anyone seen an
> extension that could accomplish this?

Could you give an example of what you're starting with, what a search looks 
like, and what you want out?  It sounds almost like you're looking for a 
custom statistical analysis of hits, which I doubt Lucene is going to have for 
you, out of the box ...

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org