lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lis...@alphamatrix.org
Subject Re: how to remove the dash
Date Mon, 25 Jun 2012 20:10:02 GMT
You are right... i'am not geting the hyphen inside any token... but it still 
used as "prohibit operator".

This is my output:
Test: bebidas - agua
Query: bebidas -agua
Tokens:
1: [bebidas:0->7:<ALPHANUM>]
2: [agua:10->14:<ALPHANUM>]

Test is the original string.
Thanks

A Segunda, 25 de Junho de 2012 19:28:06 Steven A Rowe escreveu:
> I added the following to both TestStandardAnalyzer and 
TestClassicAnalyzer
> in branches/lucene_solr_3_6/, and it passed in both cases:
> 
>   public void testWhitespaceHyphenWhitespace() throws Exception {
>     BaseTokenStreamTestCase.assertAnalyzesTo
>       (a, "drinks - water", new String[]{"drinks", "water"});
>   }
> 
> So I'm not seeing the same behavior as you guys - the hyphen is not 
part of
> any emitted token.
> 
> Steve
> 
> -----Original Message-----
> From: listas@alphamatrix.org [mailto:listas@alphamatrix.org]
> Sent: Monday, June 25, 2012 11:33 AM
> To: java-user@lucene.apache.org
> Subject: Re: how to remove the dash
> 
> A Segunda, 25 de Junho de 2012 16:10:38 Ian Lea escreveu:
> > My apologies - you are right.
> > 
> > With both ClassicAnalyzer and StandardAnalyzer, "drinks - water"
> 
> comes
> 
> > out as "drinks -water" whereas "drinks-water" comes out as "drinks
> > water", as I'd expected.
> > 
> > I guess this is fixable in JFlex, or I think there is some replace
> > tokenizer somewhere that can replace character X with character Y
> 
> e.g.
> 
> > "-" with " ".  Or pre-process your text/queries with a regexp.  Maybe
> > someone else has better ideas.
> 
> I guess the same... I'am already using my own Tokenizer(based on
> StandardTokenizer) to mark some strings for replacement or removal 
and i'am
> using a a filter to replace them and the filter to remove... And tried to
> do that with the "-" but didn't worked... I can't even mark the "-". I'am
> avoiding pre-process...
> I'am hoping that somebody could tell what can I change on 
StandardTokenizer
> JFlex to changes this behavior.
> 
> Thanks
> 
> > --
> > Ian.
> > 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-
help@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message