Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 9815 invoked from network); 19 May 2004 15:33:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 19 May 2004 15:33:49 -0000 Received: (qmail 22574 invoked by uid 500); 19 May 2004 15:31:07 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 22436 invoked by uid 500); 19 May 2004 15:31:06 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 22225 invoked by uid 98); 19 May 2004 15:31:05 -0000 Received: from hannahc7@hotmail.com by hermes.apache.org by uid 82 with qmail-scanner-1.20 (clamuko: 0.70. Clear:RC:0(64.4.31.58):. Processed in 0.176626 secs); 19 May 2004 15:31:05 -0000 X-Qmail-Scanner-Mail-From: hannahc7@hotmail.com via hermes.apache.org X-Qmail-Scanner: 1.20 (Clear:RC:0(64.4.31.58):. Processed in 0.176626 secs) Received: from unknown (HELO hotmail.com) (64.4.31.58) by hermes.apache.org with SMTP; 19 May 2004 15:31:03 -0000 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Wed, 19 May 2004 08:30:42 -0700 Received: from 193.63.235.44 by by13fd.bay13.hotmail.msn.com with HTTP; Wed, 19 May 2004 15:30:41 GMT X-Originating-IP: [193.63.235.44] X-Originating-Email: [hannahc7@hotmail.com] X-Sender: hannahc7@hotmail.com From: "Hannah c" To: lucene-user@jakarta.apache.org Bcc: Subject: Problem indexing Spanish Characters Date: Wed, 19 May 2004 15:30:41 +0000 Mime-Version: 1.0 Content-Type: text/plain; format=flowed Message-ID: X-OriginalArrivalTime: 19 May 2004 15:30:42.0091 (UTC) FILETIME=[3ED81FB0:01C43DB6] X-Spam-Rating: hermes.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi, I am indexing a number of English articles on Spanish resorts. As such there are a number of spanish characters throught the text, most of these are in the place names which are the type of words I would like to use as queries. My problem is with the StandardTokenizer class which cuts the word into two when it comes across any of the spanish characters. I had a look at the source but the code was generated by JavaCC and so is not very readable. I was wondering if there was a way around this problem or which area of the code I would need to change to avoid this. Thanks Hannah Cumming --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org