lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <pe...@peknet.com>
Subject Re: [lucy-dev] Highlighter excerpt boundaries
Date Fri, 20 Jan 2012 15:23:21 GMT
On 1/19/12 6:52 PM, Marvin Humphrey wrote:
>
> It's rare that we need to optimize for performance.  Most of the time we
> should be optimizing for maintainability.

+1

> I suspect that at some point we will want to expose sentence boundary
> detection via a public API, because people who subclass Highlighter may want
> to use it.

+1 here too.

I have been putting some work into sentence boundary detection in 
Search::Tools, and I would love to see some thinking amongst the bright 
people here about how best to do it.

>
> It seems to me that publishing UAX #29 sentence boundary detection via an
> Analyzer is a conservative API extension, since it's so closely related to the
> UAX #29 word boundary detection we expose via StandardTokenizer.
>
> So that explains what I was thinking.  But of course refactoring sentence
> boundary detection into a string utility function also achieves the end of
> cleaning up Highlighter.c just as effectively, and might be more elegant --
> who knows?
>
> Until we actually expose this capability via a public API, either approach
> should work fine.

Agreed here too.



-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Mime
View raw message