lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arnold Leung <aleu...@shaw.ca>
Subject Re: snowball (english) and filenames
Date Fri, 18 May 2007 05:29:08 GMT

On 16-May-07, at 11:00 PM, Doron Cohen wrote:

> If you enter a.b.c.d.e.f.g.h to that demo you'll see that
> the demo simply breaks the input text on '.' - that has
> nothing to do with filenames.

That is not what I am seeing from my testing:

a.b.c.d.e.f.g.h is not broken apart like how the snowball demo  
indicates it should do.

At http://snowball.tartarus.org/demo.php

"a.b.c.d.e.f.g.h" shows:

a -> a
b -> b
c -> c
d -> d
e -> e
f -> f
g -> g
h -> h

For my lucene testing, I indexed one text file with one   
"a.b.c.d.e.f.g.h" string in it and opened the index up using Luke.   
It only indexed the string a.b.c.d.e.f.g.h (and didn't parse the  
string based on the periods).


As a real world example, Logon.dll is being converted to "Logon.dl"  
rather than "Logon" and "dll" as indicated by the snowball demo.

Also:

Demo:
some-msp.msp

somemsp -> somemsp
msp -> msp

Lucene:
some-msp.msp

some
msp.msp


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message