Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
From: "Hannah c" <hannahc7@hotmail.com>
To: lucene-user@jakarta.apache.org
Bcc: 
Subject: Problem indexing Spanish Characters
Date: Wed, 19 May 2004 15:30:41 +0000
Mime-Version: 1.0
Content-Type: text/plain; format=flowed
Message-ID: <BAY13-F58tSeAa6Of770001a9e8@hotmail.com>


Hi,

I  am indexing a number of English articles on Spanish resorts. As such 
there are a number of spanish characters throught the text, most of these 
are in the place names which are the type of words I would like to use as 
queries. My problem is with the StandardTokenizer class which cuts the word 
into two when it comes across any of the spanish characters. I had a look at 
the source but the code was generated by JavaCC and so is not very readable. 
I was wondering if there was a way around this problem or which area of the 
code I would need to change to avoid this.

Thanks
Hannah Cumming


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org