lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@syr.edu>
Subject Re: Flexible Indexing (was Re: Lucene Planning)
Date Fri, 02 Jun 2006 13:48:22 GMT
I thought it was you, but wasn't sure.

I would also like a way to store the frequency of the term in the 
overall collection (probably should go in the Term dictionary, but not 
sure, at the cost of an additional VInt per term, but I am open to other 
places to store it).  Right now, in order to calculate this, one has to 
either store it separately at indexing time (using a term counting 
Filter) or calculate it at runtime by looping over the TermDocs and 
summing. 

Marvin Humphrey wrote:
>
> On Jun 1, 2006, at 5:48 AM, Grant Ingersoll wrote:
>
>> Someone on the list a while ago suggested moving Term Vectors out of 
>> the postings and storing them separately, as then they don't have to 
>> be merged (but they doc ids would have to be kept up to date)
>
> Yes, that was me.  :)  I suggested storing  TermVector data alongside 
> stored field data, in the .fdt file.  That's what KinoSearch does 
> right now.  It cuts down on disk seeks.
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

-- 

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message