Return-Path: X-Original-To: apmail-lucy-user-archive@www.apache.org Delivered-To: apmail-lucy-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E92FDD26 for ; Thu, 16 Aug 2012 17:28:46 +0000 (UTC) Received: (qmail 76673 invoked by uid 500); 16 Aug 2012 17:28:46 -0000 Delivered-To: apmail-lucy-user-archive@lucy.apache.org Received: (qmail 76638 invoked by uid 500); 16 Aug 2012 17:28:46 -0000 Mailing-List: contact user-help@lucy.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@lucy.apache.org Delivered-To: mailing list user@lucy.apache.org Received: (qmail 76629 invoked by uid 99); 16 Aug 2012 17:28:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Aug 2012 17:28:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.215.179] (HELO mail-ey0-f179.google.com) (209.85.215.179) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Aug 2012 17:28:38 +0000 Received: by eaa13 with SMTP id 13so744546eaa.10 for ; Thu, 16 Aug 2012 10:28:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=9lGgXmpvTbvwRMw417epVU2MQR94wfkCeOFNxvuKwig=; b=P4/09umwpAAgjnXn8Ebu8c4nFCIQ4A7UOGnU/GpTDhkF5H4XkGgkqBJx6k8A66+mCu XzKo57F1QxvMzahxIijQVF95FmJstTuKx0Na4aqCzc4zLdv9youwizeCSNRTJPzP31gO U61rN3CPtyLa2Qa+Rr61zUybDSBGYvnUXudV1ZeHH4YMW5NxgkAxzMsseKcUydnPNJNQ mRNJQRqOS1SlzZz9MawU9OCD3NVdC7GhLdyyW9x9Jv6xJyLrQpQrNmZ59njwbA9EtKu2 eU6cN6o5SlnrxNOvNhr+j8C8wz3MtliAInfDy7pWI3qEKXbVAWlrMozCa2SoqsEZjsiZ /XhA== MIME-Version: 1.0 Received: by 10.14.212.72 with SMTP id x48mr2650272eeo.40.1345138095780; Thu, 16 Aug 2012 10:28:15 -0700 (PDT) Received: by 10.14.48.70 with HTTP; Thu, 16 Aug 2012 10:28:15 -0700 (PDT) X-Originating-IP: [99.46.94.139] In-Reply-To: <502D1106.90004@leegoddard.net> References: <502D1106.90004@leegoddard.net> Date: Thu, 16 Aug 2012 10:28:15 -0700 Message-ID: From: Marvin Humphrey To: user@lucy.apache.org, lee@leegoddard.net Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlQt0I934gHwSx+OXmnoxMemtc99hsrGiZPvvPcQTYyGM/06yPNWEiBwEfC68CcWDyt3zjk X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [lucy-user] UTF-8 Error On Thu, Aug 16, 2012 at 8:25 AM, Lee Goddard wrote: > If I do this > > $highlighter->create_excerpt($hit); > > I get this: > > Invalid UTF-8 header byte: 00000095 > lucy_StrHelp_decode_utf8_char > at .../Lucy-0.3.2/core/Lucy/Util/StringHelper.c line 216 > > I'm not doing anything to make sure my search subject is or isn't utf8, I've > only been playing with Lucy for a few hours. > > I just thought this looked like the sort of error a Perl user isn't intended > to see. Yes, that's right. I'm surprised to see that error. Lucy attempts to force everything to UTF-8 on input so that it doesn't have to track encoding internally. That error message means that a sanity check failed. To debug, we will need to see the content of that HitDoc's value for the field being highlighted. I would suggest the following code to dump its guts into STDERR... use Devel::Peek qw( Dump ); Dump($hit->{$field_name}); If you can forward the resulting dump to this list, it may help us to understand what kind of error is occurring. Alternatively, if you're feeling ambitious, you can try to isolate a small, repeatable test case. Marvin Humphrey