lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: StandardFilter not handling dots as exptected ?
Date Thu, 06 Aug 2009 15:20:47 GMT
Ian Lea wrote:
> See https://issues.apache.org/jira/browse/LUCENE-1068 which appears to
> be talking about the same sort of thing, and
> StandardAnalyzer.setReplaceInvalidAcronym(b).
>
> Quite how you deal with this in your own analyzer is left as an exercise ...
>   

Yes I think you are right, though dont understand it fully


        TokenStream ts = analyzer.tokenStream("content", new 
StringReader("R.E.S."));
        Token t;
        while ((t = ts.next()) != null) { System.out.println("R.E.S. 
parsed to :"+t); }


        ts = analyzer.tokenStream("content", new StringReader("R.E.S"));
        while ((t = ts.next()) != null) { System.out.println("R.E.S 
parsed to :"+t); }
        }

this code outputs

R.E.S. parsed to :(res,0,6,type=<ACRONYM>)
R.E.S parsed to :(r.e.s,0,5,type=<HOST>)

so from my perspective I cannot see
it thinks R.E.S is a HOST it should be an acronym, but also for the one 
that is an acronym I thought it end up as r.e.s not res

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message