lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ABDOU Samir" <samir.ab...@unine.ch>
Subject RE : Term Collection Frequency?
Date Thu, 05 Aug 2004 11:35:45 GMT
Thanks Doug,

You gave me an important hint to my initial question. I think that one way to add this without
affecting performance is to store the information in another file and use it whenever the
search model needs it, thus the IndexReader reads the data on demand. If the search model
doesn't use this information to calculate scores, then the file containing collection frequencies
isn't loaded at all. Here, the advantage is that the actual index structure is not altered!


Regards,
Samir

> -----Message d'origine-----
> De : Doug Cutting [mailto:cutting@apache.org]
> Envoyé : mercredi, 4. août 2004 23:04
> À : Lucene Developers List
> Objet : Re: RE : Term Collection Frequency?
> 
> Grant Ingersoll wrote:
> > Once again, I think a generic Metadata Reader/Writer interface would be
> > the ideal solution for all of these types of problems.
> >
> > See
> > http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene-
> user@jakarta.apache.org&msgId=1777978
> >
> > I am more than willing to help w/ an implementation, but do not want to
> > go it alone w/o some consensus from the committers/Doug that such an
> > idea would be accepted as I think the change may be fairly involved.
> 
> My concern is that truly generic metadata of this sort would be big and
> slow.  But I'd love to see a proposal that performs well!
> 
> Adding, e.g., collection frequency to indexes would not be too hard:
> you'd need to add a field to TermInfo, extend TermInfosWriter,
> DocumentWriter, and SegmentMerger to maintain it, then extend
> SegmentTermEnum, IndexReader, SegmentReader and MultiReader to access
> it.  Indexes would be a little larger and a little slower, but not
> significantly.
> 
> Architecting things so that this same change could be easily made
> without modifying any internals is a much bigger challenge.  And, once
> this is done, making it so that index size and performance is little
> altered is harder yet.  If you have a design that achieves this, please
> share it.
> 
> Doug
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message