lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] UTF-8 Error
Date Thu, 16 Aug 2012 17:28:15 GMT
On Thu, Aug 16, 2012 at 8:25 AM, Lee Goddard <leegee@gmail.com> wrote:
> If I do this
>
>     $highlighter->create_excerpt($hit);
>
> I get this:
>
>         Invalid UTF-8 header byte: 00000095
>         lucy_StrHelp_decode_utf8_char
>          at .../Lucy-0.3.2/core/Lucy/Util/StringHelper.c line 216
>
> I'm not doing anything to make sure my search subject is or isn't utf8, I've
> only been playing with Lucy for a few hours.
>
> I just thought this looked like the sort of error a Perl user isn't intended
> to see.

Yes, that's right.  I'm surprised to see that error.  Lucy attempts to force
everything to UTF-8 on input so that it doesn't have to track encoding
internally.  That error message means that a sanity check failed.

To debug, we will need to see the content of that HitDoc's value for the field
being highlighted.  I would suggest the following code to dump its guts into
STDERR...

    use Devel::Peek qw( Dump );
    Dump($hit->{$field_name});

If you can forward the resulting dump to this list, it may help us to
understand what kind of error is occurring.

Alternatively, if you're feeling ambitious, you can try to isolate a small,
repeatable test case.

Marvin Humphrey

Mime
View raw message