lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hannah c" <hanna...@hotmail.com>
Subject Problem indexing Spanish Characters
Date Wed, 19 May 2004 15:30:41 GMT

Hi,

I  am indexing a number of English articles on Spanish resorts. As such 
there are a number of spanish characters throught the text, most of these 
are in the place names which are the type of words I would like to use as 
queries. My problem is with the StandardTokenizer class which cuts the word 
into two when it comes across any of the spanish characters. I had a look at 
the source but the code was generated by JavaCC and so is not very readable. 
I was wondering if there was a way around this problem or which area of the 
code I would need to change to avoid this.

Thanks
Hannah Cumming



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message