Caution: there're a lot of substantial changes in this commit. I've tested things pretty well, but there may well be more bugs. Please consider the CVS a little flakier than usual right now, until a few folks have tested these changes. But please do give these changes a try! They should make a lot of phrase and conjunctive queries faster, especially with big indexes. Tell me if you have any problems. Cheers, Doug cutting@apache.org wrote: > + > + 1. Changed the format of the .tis file, so that: > + > + - it has a format version number, which makes it easier to > + back-compatibly change file formats in the future. > + > + - the term count is now stored as a long. This was the one aspect > + of the Lucene's file formats which limited index size. > + > + - a few internal index parameters are now stored in the index, so > + that they can (in theory) now be changed from index to index, > + although there is not yet an API to do so. > + > + These changes are back compatible. The new code can read old > + indexes. But old code will not be able read new indexes. (cutting) > + > + 2. Added an optimized implementation of TermDocs.skipTo(). A skip > + table is now stored for each term in the .frq file. This only > + adds a percent or two to overall index size, but can substantially > + speedup many searches. (cutting) > + > + 3. Restructured the Scorer API and all Scorer implementations to take > + advantage of an optimized TermDocs.skipTo() implementation. In > + particular, PhraseQuerys and conjunctive BooleanQuerys are > + faster when one clause has substantially fewer matches than the > + others. (A conjunctive BooleanQuery is a BooleanQuery where all > + clauses are required.) (cutting) > + > + --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org