Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 14203 invoked from network); 24 Sep 2004 19:12:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 24 Sep 2004 19:12:59 -0000 Received: (qmail 26192 invoked by uid 500); 24 Sep 2004 19:14:42 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 25968 invoked by uid 500); 24 Sep 2004 19:14:40 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 25787 invoked by uid 99); 24 Sep 2004 19:14:37 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [194.25.134.83] (HELO mailout07.sul.t-online.com) (194.25.134.83) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 24 Sep 2004 12:14:35 -0700 Received: from fwd06.aul.t-online.de by mailout07.sul.t-online.com with smtp id 1CAvWb-0008BH-00; Fri, 24 Sep 2004 21:14:33 +0200 Received: from pD9EBF5A4.dip.t-dialin.net (Gc67XcZXge3FqC+M-c7PHxzOOyn7+fgGXUR8cKLzKJRp8iqJxefbcm@[217.235.245.164]) by fmrl06.sul.t-online.com with esmtp id 1CAvWS-1CwvU80; Fri, 24 Sep 2004 21:14:24 +0200 From: Daniel Naber To: "Lucene Users List" Subject: Re: demo IndexHTML parser breaks unicode? Date: Fri, 24 Sep 2004 21:17:28 +0200 User-Agent: KMail/1.7 References: <6.0.1.1.2.20040924135118.0408cea8@fast.synernet.com> In-Reply-To: <6.0.1.1.2.20040924135118.0408cea8@fast.synernet.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200409242117.29107@danielnaber.de> X-ID: Gc67XcZXge3FqC+M-c7PHxzOOyn7+fgGXUR8cKLzKJRp8iqJxefbcm@t-dialin.net X-TOI-MSGID: 81dff84c-d42b-40e1-b255-13f783d00131 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Friday 24 September 2004 19:58, Fred Toth wrote: > I've got unicode in my source HTML. In particular, within meta tags, > and it's getting broken by the indexer. Note that I'm not trying to > query on any of this, just store and retrieve document titles with > unicode characters. Please try again with the code from CVS, Christoph Goller committed a fix for this problem (at least I think it was this problem) 1-3 weeks ago. Regards Daniel -- http://www.danielnaber.de --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org