Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 8411 invoked from network); 15 Nov 2006 16:35:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Nov 2006 16:35:20 -0000 Received: (qmail 9238 invoked by uid 500); 15 Nov 2006 16:35:24 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 9202 invoked by uid 500); 15 Nov 2006 16:35:24 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 9187 invoked by uid 99); 15 Nov 2006 16:35:24 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Nov 2006 08:35:24 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [64.97.157.186] (HELO n034.sc1.cp.net) (64.97.157.186) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Nov 2006 08:35:10 -0800 Received: from [10.1.1.2] (75.45.69.112) by n034.sc1.cp.net (7.2.069.1) (authenticated as mike@curtin.com) id 455A206B000383E6 for java-user@lucene.apache.org; Wed, 15 Nov 2006 16:34:49 +0000 Message-ID: <455B41A8.7090209@curtin.com> Date: Wed, 15 Nov 2006 11:34:48 -0500 From: "Michael D. Curtin" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: term vectors References: <002201c708d0$b700a100$b3640a0a@DJ8P5VB1PR> In-Reply-To: <002201c708d0$b700a100$b3640a0a@DJ8P5VB1PR> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Phil Rosen wrote: > I am building an application that requires I index a set of documents on > the scale of hundreds of thousands. > > A document can have a varying number of attribute fields with an unknown > set of potential values. I realize that just indexing a blob of fields > would be much faster, however I need to bin the search results based on > common attributes; as different types of attributes could potentially have > overlapping values a single blob for all attributes wont work. > > My question is this, is there a way to get term frequencies for a set of > documents or hits, without using getTermFreqVector() on each document and > each attribute field? As I could have hundreds of results, each with > dozens of attribute fields, looping getTermFreqVector() would be very > slow. If there isn't something inherent to lucene, has anyone seen an > extension that could accomplish this? Could you give an example of what you're starting with, what a search looks like, and what you want out? It sounds almost like you're looking for a custom statistical analysis of hits, which I doubt Lucene is going to have for you, out of the box ... --MDC --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org