lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: cvs commit: jakarta-lucene/src/test/org/apache/lucene/util English.java
Date Thu, 15 Jan 2004 22:47:22 GMT
Caution: there're a lot of substantial changes in this commit.  I've 
tested things pretty well, but there may well be more bugs.  Please 
consider the CVS a little flakier than usual right now, until a few 
folks have tested these changes.

But please do give these changes a try!  They should make a lot of 
phrase and conjunctive queries faster, especially with big indexes. 
Tell me if you have any problems.

Cheers,

Doug

cutting@apache.org wrote:
>   +
>   + 1. Changed the format of the .tis file, so that:
>   +
>   +    - it has a format version number, which makes it easier to
>   +      back-compatibly change file formats in the future.
>   +
>   +    - the term count is now stored as a long.  This was the one aspect
>   +      of the Lucene's file formats which limited index size.
>   +
>   +    - a few internal index parameters are now stored in the index, so
>   +      that they can (in theory) now be changed from index to index,
>   +      although there is not yet an API to do so.
>   +
>   +    These changes are back compatible.  The new code can read old
>   +    indexes.  But old code will not be able read new indexes. (cutting)
>   +
>   + 2. Added an optimized implementation of TermDocs.skipTo().  A skip
>   +    table is now stored for each term in the .frq file.  This only
>   +    adds a percent or two to overall index size, but can substantially
>   +    speedup many searches.  (cutting)
>   +
>   + 3. Restructured the Scorer API and all Scorer implementations to take
>   +    advantage of an optimized TermDocs.skipTo() implementation.  In
>   +    particular, PhraseQuerys and conjunctive BooleanQuerys are
>   +    faster when one clause has substantially fewer matches than the
>   +    others.  (A conjunctive BooleanQuery is a BooleanQuery where all
>   +    clauses are required.)  (cutting)
>   +
>   +


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message