lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philippe Laflamme" <plafla...@konova.com>
Subject RE: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Tue, 18 Nov 2003 14:53:21 GMT
> Even if that implementation wasn't fast (which it should be), it
> should be
> fairly easy to implement it to be pretty much as efficient as any
> of basic
> tokenizers; ie. not much slower than full scanning speed over
> text data and
> token creation overhead.

In terms of speed I would tend to agree with you.

My question regarding efficiency was directed more towards the quality of
the results it provides. Is the BreakIterator breaking on correct sentence
boundaries or is it being confused by dots at the end of acronyms and such.

Karsten was mentioning that it's results are of higher quality when you
prevent it from breaking after a number. Are there any other tips you can
provide?

Has anybody tested the implementation to estimate its precision?

Regards,
Phil

> -----Original Message-----
> From: Tatu Saloranta [mailto:tatu@hypermall.net]
> Sent: November 17, 2003 22:00
> To: Lucene Users List
> Subject: Re: inter-term correlation [was Re: Vector Space Model
> in Lucene?]
>
>
> On Monday 17 November 2003 07:40, Chong, Herb wrote:
> > i don't know what the Java implementation is like but the C++
> one is very
> > fast.
> ...
> >> I personally do not have any experience with the BreakIterator
> in Java. Has
> >> anyone used it in any production environment? I'd be very interested to
> >> learn more about it's efficiency.
>
> Even if that implementation wasn't fast (which it should be), it
> should be
> fairly easy to implement it to be pretty much as efficient as any
> of basic
> tokenizers; ie. not much slower than full scanning speed over
> text data and
> token creation overhead.
>
> -+ Tatu +-
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message