lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hannah c" <>
Subject Problem indexing Spanish Characters
Date Wed, 19 May 2004 15:30:41 GMT


I  am indexing a number of English articles on Spanish resorts. As such 
there are a number of spanish characters throught the text, most of these 
are in the place names which are the type of words I would like to use as 
queries. My problem is with the StandardTokenizer class which cuts the word 
into two when it comes across any of the spanish characters. I had a look at 
the source but the code was generated by JavaCC and so is not very readable. 
I was wondering if there was a way around this problem or which area of the 
code I would need to change to avoid this.

Hannah Cumming

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message