lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject [lucy-dev] Re: [KinoSearch] Highlighter Bug
Date Wed, 17 Nov 2010 23:13:12 GMT
> > I had a closer look and the error case is that you have three sentences 
> > where only the first and the last contain keywords but the middle one is 
> > chosen for the excerpt.

Found the bug.  S_has_heat() expects a length but was being passed an offset.

As a result, S_has_heat() was approving an excerpt -- because the excerpt had
"heat", meaning a warm spot in the HeatMap -- but the warm spot actually lay
outside the excerpt's boundaries.  Thus an excerpt which should have been
rejected was being approved.  Or, to be precise, the truncation of the excerpt
to end on a particular sentence boundary was approved when it should not have
been.

Before:

    bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
    bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
    bla.<strong></strong>

After (the &#8230; is a Unicode ellipsis):

    bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
    bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla.  bla
    bla bla <strong>MMM</strong> bla bla bla bla bla bla bla bla bla
    bla&#8230;

I've committed the fix as r6485 to the KinoSearch repository.  It would be
nice to augment that with the test case you provided and commit to Lucy as
well.

Cheers,

Marvin Humphrey


Mime
View raw message