Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 12763 invoked from network); 7 Oct 2004 08:53:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 7 Oct 2004 08:53:49 -0000 Received: (qmail 89741 invoked by uid 500); 7 Oct 2004 08:53:39 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 89718 invoked by uid 500); 7 Oct 2004 08:53:38 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 89703 invoked by uid 99); 7 Oct 2004 08:53:38 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [194.105.121.194] (HELO nlvdhx10.asml.nl) (194.105.121.194) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 07 Oct 2004 01:53:38 -0700 Received: from nlvdhv01.asml.nl (nlvdhv01.asml.nl [194.105.121.195]) by nlvdhx10.asml.nl (8.11.7p1+Sun/8.11.7) with SMTP id i978rYL25466 for ; Thu, 7 Oct 2004 10:53:34 +0200 (MEST) Received: from creon.asml.nl(146.106.1.223) by nlvdhv01.asml.nl via csmap id 5ef66d48_183e_11d9_88c4_003048290bad_26618; Thu, 07 Oct 2004 10:53:34 +0200 (CEST) Received: from titan.asml.nl (titan [146.106.1.9]) by creon.asml.nl (8.11.7+Sun/8.11.7) with ESMTP id i978rY507499 for ; Thu, 7 Oct 2004 10:53:34 +0200 (MEST) Received: from nlvdhx04.asml.nl (nl-smtp.asml.nl [146.106.95.36]) by titan.asml.nl (8.11.7+Sun/8.11.6) with ESMTP id i978rWZ13425 for ; Thu, 7 Oct 2004 10:53:32 +0200 (MET DST) Received: from [146.106.75.82] (dyn-75-082 [146.106.75.82]) by nlvdhx04.asml.nl (8.11.7+Sun/8.11.6) with ESMTP id i978rVe27613 for ; Thu, 7 Oct 2004 10:53:32 +0200 (MEST) Message-ID: <41650402.1000105@asml.nl> Date: Thu, 07 Oct 2004 10:53:22 +0200 From: Daan Hoogland User-Agent: Mozilla Thunderbird 0.7.3 (Windows/20040803) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: indexing numeric entities? References: <4164DB7F.9070409@asml.nl> In-Reply-To: <4164DB7F.9070409@asml.nl> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-NAIMIME-Disclaimer: 1 X-NAIMIME-Modified: 1 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Daan Hoogland wrote: >Hello, > >Does anyone do indexeing of numeric entities for japanese characters? I >have (non-x)html containing those entities and need to index and search >them. > > > > Can the CJKAnalyzer index a string like "●入社"? It seems to be ignored completely when used with the demo. There was talk on this list of fixes for the demo HTMLParser, do these adres this issue? When I look ate the code it seems that the entities should have been interpreted before indexing. What am I missing? Any comment please? Or a pointer to a howto for dumm^H^H^H^H^H westerners? thanks, -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. ASML is neither liable for the proper and complete transmission of the information contained in this communication, nor for any delay in its receipt. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org