lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: RE : Term Collection Frequency?
Date Wed, 04 Aug 2004 21:03:33 GMT
Grant Ingersoll wrote:
> Once again, I think a generic Metadata Reader/Writer interface would be
> the ideal solution for all of these types of problems.
> 
> See
> http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene-user@jakarta.apache.org&msgId=1777978
> 
> I am more than willing to help w/ an implementation, but do not want to
> go it alone w/o some consensus from the committers/Doug that such an
> idea would be accepted as I think the change may be fairly involved.

My concern is that truly generic metadata of this sort would be big and 
slow.  But I'd love to see a proposal that performs well!

Adding, e.g., collection frequency to indexes would not be too hard: 
you'd need to add a field to TermInfo, extend TermInfosWriter, 
DocumentWriter, and SegmentMerger to maintain it, then extend 
SegmentTermEnum, IndexReader, SegmentReader and MultiReader to access 
it.  Indexes would be a little larger and a little slower, but not 
significantly.

Architecting things so that this same change could be easily made 
without modifying any internals is a much bigger challenge.  And, once 
this is done, making it so that index size and performance is little 
altered is harder yet.  If you have a design that achieves this, please 
share it.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message