lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject DoubleMetaphone bugs?
Date Wed, 12 Aug 2009 20:29:46 GMT
I'm converting many of the TokenFilters to the new Lucene attribute API...
I'm currently on DoubleMetaphone, but something looks wrong.

      // If we did not add something, then go to the next one...
      if( !isPhonetic ) {
        t = next(in);
        if( t != null ) {
          t.setPositionIncrement( t.getPositionIncrement()+1 );
        }
        return t;
      }

It looks like if DoubleMetaphone didn't add any tokens, then the
*next* token is indexed w/o any variants?  That doesn't make sense,
but it also seems like it could mess up the token ordering since the
original token (in the case of inject==true) hasn't even been returned
yet.

I also couldn't find any documentation on exactly how "inject" is
supposed to work.  DoubleMetaphoneFilterFactory doesn't appear at
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the
javadoc doesn't give any clues.
We also have PhoneticFilter... should DoubleMetaphoneFilterFactory be
deprecated?


-Yonik
http://www.lucidimagination.com

Mime
View raw message