lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject Re: TermVectorsWriter and DocumentsWriter
Date Fri, 17 Aug 2007 18:18:54 GMT
"Doug Cutting" <cutting@apache.org> wrote:
> Michael McCandless wrote:
> > One thing I have been wondering is whether it really is necessary to
> > sort the term vectors before writing to the index....
> 
> Terms in vectors are prefix-compressed.  So not sorting would make 
> indexes bigger, and slower to read & write.
> 
> http://lucene.apache.org/java/docs/fileformats.html#Term%20Vectors

Duh, I forgot about that :)  So I think we should indeed continue to
write them sorted.

> Also, having them sorted makes it much easier to do dot products between 
> document vectors, a potentially common operation.

True.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message