lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Taurat" <daniel.tau...@gaussvip.com>
Subject RE: jaspq: dashed numerical values tokenized differently
Date Wed, 03 Nov 2004 16:27:43 GMT


Mit freundlichen Grüßen
    Dr. Daniel Taurat
    Senior Consultant
--------------------------------------------------------------------
VIP ENTERPRISE 8 | THE POWER OF CONTENT AT WORK
--------------------------------------------------------------------

Gauss Interprise AG		 Phone:    +49-40-3250-1508
Weidestr. 120 a 		Mobile:    +49-173-2418472
D-22083 Hamburg  Germany	     Fax:    +49-40-3250-191508
                    
E-Mail: daniel.taurat@gaussvip.com
Web:    http://www.gaussvip.com


> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Mittwoch, 3. November 2004 16:49
> To: Lucene Users List
> Subject: Re: jaspq: dashed numerical values tokenized differently
> 
> On Nov 3, 2004, at 10:21 AM, Daniel Taurat wrote:
> > Checked with Luke on the string
> > dash\-123\-01
> >
> > and got
> >
> > dash
> > 123
> > 01
> >
> > with germanAnalyzer and standardAnalyzer
> > and
> >
> > dash
> >
> > with all the other, except for whitespaceAnalyser, of course.
> >
> >
> > This makes me think that an escaped dash is never a minus, somehow.
> 
> No builtin Analyzer considers backslash an escape character - and most
> consider it a delimiter between tokens and throws it away as you've
> seen.  Only QueryParser has the escape character feature.
> 
> 	Erik

Okay, that I understand...

But then, where do the dashes, I mean, the minuses,(**sigh**) anyway, where do they go?

-123 becomes 123 for some (german and standard) and is completely discarded for others (russian,
simple, stop) and whitespace does its own thing, again 
(-123).

Ahhahh!! now I've got it:
 since
-123go 
becomes 
go for Russian, stop and simple 
but
123go for german and standard
I guess the first group just completely omits numbers, effectively being separators (that
I checked as well), while the latter only omits the leading minus(dash?).
Grouping is caused by inheritance.

Daniel


 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message