lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Wunderlich <martin.wunderl...@gmx.net>
Subject Re: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes
Date Sun, 11 Jan 2015 20:25:56 GMT
Hi Uwe, 

Thanks a lot for the detailed reply. I'll see how far I get with it, but being quite new to
Lucene, it seems I am lacking a bit of background information to fully understand the response
below. In particular, I need to do some background reading on how token streams and readers
work, I guess. 

Cheers, 

Martin
 

Am 11.01.2015 um 11:05 schrieb Uwe Schindler <uwe@thetaphi.de>:

> Hi, 
> 
> 
> 
> First, there is also a migrate guide next to the changes log: http://lucene.apache.org/core/4_10_3/MIGRATE.html
> 
> 
> 
> 1. If you implement analyzer, you have to override createComponents() which return TokenStreamComponents
objects. See other Analyzer’s source code to understand how to use it. One simple example
is in the Javadocs: http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/analysis/Analyzer.html
> 
> 
> 
> 2. Use initReader() to wrap filters around readers. This class is protected and can be
overridden. CharFilter implements Reader, so you can wrap any CharFilter there. Your HTMLStripCharsFilter
have to wrapped around the given reader here.
> 
> 
> 
> 3./4. Term vectors are different in Lucene 4. Basically term vectors are a small index
for each document. And this is how its implemented. You get back a Fields/Terms instances,
which are basically like AtomicReader’s backend – you can even execute a Query on the
vectors:
> 
> IndexReader#getTermVector() returns Terms for a specific field:
> 
> <http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/index/IndexReader.html#getTermVector(int,%20java.lang.String)>
> 
> For all Fields (harder to use, unwrapping for a specific field is done above – this
one is more to execute Querys and so on):
> 
> <http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/index/IndexReader.html#getTermVectors(int)>
> 
> 
> 
> Uwe
> 
> 
> 
> -----
> 
> Uwe Schindler
> 
> H.-H.-Meier-Allee 63, D-28213 Bremen
> 
> <http://www.thetaphi.de/> http://www.thetaphi.de
> 
> eMail: uwe@thetaphi.de
> 
> 
> 
> From: Martin Wunderlich [mailto:martin.wunderlich@gmx.net] 
> Sent: Sunday, January 11, 2015 9:18 AM
> To: java-user@lucene.apache.org
> Subject: Upgrading Lucene from 3.5 to 4.10 - how to handle Java API changes
> 
> 
> 
> Hi all, 
> 
> 
> 
> I am currently in the process of upgrading a search engine application from Lucene 3.5.0
to version 4.10.3. There have been some substantial API changes in version 4 that break backward
compatibility. I have managed to fix most of them, but a few issues remain that I could use
some help with:
> 
> 1.	"cannot override final method from Analyzer"
> 
> The original code extended the Analyzer class and the overrode tokenStream(...). 
> 
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader) {
>    CharStream charStream = CharReader.get(reader);        
>    return
>        new LowerCaseFilter(version,
>            new SeparationFilter(version,
>                new WhitespaceTokenizer(version,
>                    new HTMLStripFilter(charStream))));
> }
> 
> But this method is final now and I am not sure how to understand the following note from
the change log: 
> 
> "ReusableAnalyzerBase has been renamed to Analyzer. All Analyzer implementations must
now use Analyzer.TokenStreamComponents, rather than overriding .tokenStream() and .reusableTokenStream()
(which are now final). "
> 
> There is another problem in the method quoted above: 
> 
> 2.	"The method get(Reader) is undefined for the type CharReader"
> 
> There seem to have been some considerable changes here, too. 
> 
> 3.	"TermPositionVector cannot be resolved to a type"
> 
> This class is gone now in Lucene 4. Are there any simple fixes for this? From the change
log: "The term vectors APIs (TermFreqVector, TermPositionVector, TermVectorMapper) have been
removed in favor of the above flexible indexing APIs, presenting a single-document inverted
index of the document from the term vectors."
> 
> Probably related to this: 4. "The method getTermFreqVector(int, String) is undefined
for the type IndexReader."
> 
> Both problems occur here, for instance: 
> 
> TermPositionVector termVector = (TermPositionVector) reader.getTermFreqVector(...);
> 
> ("reader" is of Type IndexReader)
> 
> I would appreciate any help with these issues. Thanks a lot in advance.
> 
> Cheers, 
> 
> Martin
> 
> 
> 
> PS: FYI, I have posted the same question on Stackoverflow: http://stackoverflow.com/questions/27881296/upgrading-lucene-from-3-5-to-4-10-how-to-handle-java-api-changes?noredirect=1#comment44166161_27881296
> 


Mime
View raw message