lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Koch" <k...@orbiteam.de>
Subject SynonymAnalyzer(s) in PyLucene34
Date Thu, 27 Oct 2011 07:27:05 GMT
Hi,
while I was playing with the SynonymAnalyzer stuff (pylucene-3.4 samples) I
discovered that the wordnet example is broken due to an outdated wordnet
database: The SynonymAnalyzerTest works fine, but the SynonymAnalyzerViewer
fails with:
...lucene.JavaError: org.apache.lucene.index.IndexFormatTooOldException:
Format version is not supported in file 'segments': 44132 (needs to be
between -1 and -11). This version of Lucene only supports indexes created
with release 3.0 and later.

The WordNetSynonymEngine uses an index contained in the indexes.tgz file
which is looked up in indexes\wordnet - this file (dated 2004) seems to be
an old lucene index format. I managed to find the files required to build
the index for lucene-3.4, adjusted the WordNetSynonymEngine to work with
lucene 3.4 and all seems to be working again. I've created an archive with
the relevant changes and uploaded it to the pylucene-extras project - just
in case anyone is interested:
http://code.google.com/a/apache-extras.org/p/pylucene-extra/downloads/list

BTW, who is maintaining/updating the samples that are included in the
distribution?

It should be noted that the SynonymAnalyzer examples are based on the lia
book and implement their own Synonym support while there is currently
already support for SynonymAnalyzer in java-lucene-3.4:  package
org.apache.lucene.analysis.synonym;  (in contrib)
 
see CHANGELOG
 LUCENE-3233, LUCENE-3375: Added SynonymFilter for applying multi-word
synonyms during indexing or querying (with parsers for wordnet and solr
formats). Removed contrib/wordnet. 
 
It's already included in the PyLucene core: lucene.SynonymFilter - however I
couldn't find any samples / tests for this new feature - will have to play
with this one as well... Let me know if anyone has made experience with the
new lucene.SynonymFilter and possible advantages over the Python-based
implementation (in
pylucene-3.4\samples\LuceneInAction\lia\analysis\synonym).


regards 
Thomas
--
OrbiTeam Software GmbH & Co. KG
Endenicher Allee 35
53121 Bonn - Germany
http://www.orbiteam.de




Mime
View raw message