Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Received-SPF: neutral (asf.osuosl.org: local policy)
Date: Sun, 9 Oct 2005 17:42:41 -0700 (PDT)
From: Chris Hostetter <hossman_lucene@fucit.org>
Sender: hossman@hal.rescomp.berkeley.edu
To: java-dev@lucene.apache.org
Subject: Re: Adding information to an index
In-Reply-To: <C4250D1D-EEF1-41E7-815A-B6DC74AE868F@ict.usc.edu>
Message-ID: <Pine.LNX.4.58.0510091731050.24574@hal.rescomp.berkeley.edu>
References: <C4250D1D-EEF1-41E7-815A-B6DC74AE868F@ict.usc.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

: I'm looking to store some additional information in a Lucene index
: and I'm looking for an advise on how to implement the functionality.
: Specifically, I'm planning to store 1) collection frequency count for
: each term, 2) actual document length for each document (yes, I looked
: at the norm factor, I'm still considering how to adapt it...) 3)
: collection size (total number of terms) for each field 4) vocabulary
: size (number of unique terms) for each field. All this info can be
: computed on the fly, but I would prefer to generate it at the
: indexing time and store somewhere.

Unless I'm missunderstanding your terminology, It seems like all of this
information is either already stored in the index, or easy to add using
the existing API


  #1 - Searchable.docFreq(Term):int
  #2 - add as a new field per document.
  #3 & #4 ...

...these are a little trickier.  You can easily get both by iterating over
IndexReader.terms(), but if you specifically want to store the data in the
index, I would first add all of your documents, then use the TermEnum
to compute the information and put it all as stored fields in a single
"metadata" document with no indexed fields (or at least: none in common
with your regular data).

now you've precomputed everything you want to know, and it's easily
available at query time.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org