Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 11368 invoked from network); 8 Oct 2005 23:08:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 8 Oct 2005 23:08:11 -0000 Received: (qmail 90498 invoked by uid 500); 8 Oct 2005 23:08:09 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 90377 invoked by uid 500); 8 Oct 2005 23:08:08 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Delivered-To: moderator for java-dev@lucene.apache.org Received: (qmail 9143 invoked by uid 99); 8 Oct 2005 20:31:20 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Mime-Version: 1.0 (Apple Message framework v734) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: java-dev@lucene.apache.org From: Anton Leuski Subject: Adding information to an index Date: Sat, 8 Oct 2005 13:31:15 -0700 X-Mailer: Apple Mail (2.734) X-Junkmail-Status: score=0/50, host=smtp.vzavenue.net X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Greetings, I'm looking to store some additional information in a Lucene index and I'm looking for an advise on how to implement the functionality. Specifically, I'm planning to store 1) collection frequency count for each term, 2) actual document length for each document (yes, I looked at the norm factor, I'm still considering how to adapt it...) 3) collection size (total number of terms) for each field 4) vocabulary size (number of unique terms) for each field. All this info can be computed on the fly, but I would prefer to generate it at the indexing time and store somewhere. I think I figured out how to handle #1) -- I found a post by Doug Cutting about it which pointed me in the right direction. What to do about the rest of the info? I'd like the implementation to automatically update the counts as documents are added and deleted from the index. Thank you. -- Anton --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org