lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes
Date Sun, 11 Jan 2015 10:05:30 GMT


First, there is also a migrate guide next to the changes log:


1. If you implement analyzer, you have to override createComponents() which return TokenStreamComponents
objects. See other Analyzer’s source code to understand how to use it. One simple example
is in the Javadocs:


2. Use initReader() to wrap filters around readers. This class is protected and can be overridden.
CharFilter implements Reader, so you can wrap any CharFilter there. Your HTMLStripCharsFilter
have to wrapped around the given reader here.


3./4. Term vectors are different in Lucene 4. Basically term vectors are a small index for
each document. And this is how its implemented. You get back a Fields/Terms instances, which
are basically like AtomicReader’s backend – you can even execute a Query on the vectors:

IndexReader#getTermVector() returns Terms for a specific field:


For all Fields (harder to use, unwrapping for a specific field is done above – this one
is more to execute Querys and so on):






Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen




From: Martin Wunderlich [] 
Sent: Sunday, January 11, 2015 9:18 AM
Subject: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes


Hi all, 


I am currently in the process of upgrading a search engine application from Lucene 3.5.0 to
version 4.10.3. There have been some substantial API changes in version 4 that break backward
compatibility. I have managed to fix most of them, but a few issues remain that I could use
some help with:

1.	"cannot override final method from Analyzer"

The original code extended the Analyzer class and the overrode tokenStream(...). 

public TokenStream tokenStream(String fieldName, Reader reader) {
    CharStream charStream = CharReader.get(reader);        
        new LowerCaseFilter(version,
            new SeparationFilter(version,
                new WhitespaceTokenizer(version,
                    new HTMLStripFilter(charStream))));

But this method is final now and I am not sure how to understand the following note from the
change log: 

"ReusableAnalyzerBase has been renamed to Analyzer. All Analyzer implementations must now
use Analyzer.TokenStreamComponents, rather than overriding .tokenStream() and .reusableTokenStream()
(which are now final). "

There is another problem in the method quoted above: 

2.	"The method get(Reader) is undefined for the type CharReader"

There seem to have been some considerable changes here, too. 

3.	"TermPositionVector cannot be resolved to a type"

This class is gone now in Lucene 4. Are there any simple fixes for this? >From the change
log: "The term vectors APIs (TermFreqVector, TermPositionVector, TermVectorMapper) have been
removed in favor of the above flexible indexing APIs, presenting a single-document inverted
index of the document from the term vectors."

Probably related to this: 4. "The method getTermFreqVector(int, String) is undefined for the
type IndexReader."

Both problems occur here, for instance: 

TermPositionVector termVector = (TermPositionVector) reader.getTermFreqVector(...);

("reader" is of Type IndexReader)

I would appreciate any help with these issues. Thanks a lot in advance.




PS: FYI, I have posted the same question on Stackoverflow:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message