lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hackl, Rene" <Rene.Ha...@FIZ-Karlsruhe.DE>
Subject Re: derive tokens from single token
Date Mon, 29 Sep 2003 15:01:54 GMT
Thank you very much for your feedback, guys! 

Erik, Pierrick, thanks a lot for your code, I'm trying to adapt either
approach right now.

> except that you'll be indexing a ton of 
> terms I'd guess.  If there is some other way to split these words by 
> separating by prefix ("hexa", "hepta") and suffix ("alene", "alin") it 
> would likely be better.  But maybe its not practical to do so.

There'll be at least two indexes, one "normal" one and another bloated one.
Dan suggested splitting, too, but, unfortunately, if users search for e.g.

"9-Oxabicyclo[3.3.1]nona-2,6-diene"

they don't want anything else than that substance, as opposed to 

"*-Oxabicyclo[3.3.1]nona*" 

where they'd be interested in substances from that family - whatever the
numbers are.

If you're interested, once I've some hard performance results at hand, I
could post them around.

Best regards,

René Hackl

Mime
View raw message