after i had replaced "QueryParser.jj" with the newest version from cvs the queryparser accepts my query, and i can now perform ø/æ/å searches from commandline, then i guess there is something wrong with my search servlets unicode handling :-) thank you very much! karl øie/gan media -----Original Message----- From: Jonas Bechlund [mailto:jonas.bechlund@framfab.dk] Sent: 27. november 2001 13:52 To: 'Lucene Users List' Subject: RE: scandinavian characters. Hi Karl, It is a little bit tricky - but when you get the idea it is not that bad... I had the same problem with the danish characters. I made changes TOKEN definition in the "Token Definitions" section of the file "QueryParser.jj" and that actually solved the problem. One minor detail is that you have to rebuild the jar file with ANT. (See build.txt for instructions) I guess that solves your problem, Regards, / Jonas -----Original Message----- From: Karl Øie [mailto:karl@gan.no] Sent: 27 November 2001 13:01 To: Lucene Users List Subject: RE: scandinavian characters. there must be something seriously broken with the queryparse code. if a query starts with ø/æ/å (ø, &oaelig;, å) then an exception in the queryparser occurs. org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 1. Encountered: "\u00c3" (195), after : "" at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown Source) at org.apache.lucene.queryParser.QueryParser.jj_ntk(Unknown Source) at org.apache.lucene.queryParser.QueryParser.Modifiers(Unknown Source) at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source) at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source) at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source) but if the query contains ø/æ/å (ø, &oaelig;, å) then it is translated wrongly into the swedish/german ä regardless of what character it was. if someone could point me to where to start I could try to find the problem because I guess it is errorous unicode translation... mvh karl >no it's even stranger than that, i have decoded the querystring, the problem >is that it seems like something is changed on the way in. if i search for >"fjøs" (fjøs) i get the swedish "fjä" (fjÄ). Where ø is >changed to Ä and 's' is removed. > >is the querystring translated some where? > >mvh karl øie > -----Original Message----- > From: David Bonilla [mailto:david@bit-bang.com] > Sent: 27. november 2001 10:43 > To: Lucene Users List; karl@gan.no > Subject: Re: scandinavian characters. > > > Hi Karl !!! > > I´m spanish and I have a lot of problems programming with our not english >characters. I use LUCENE with spanish accents and it works fine... > > Have you tried to use the java.net.URLEncoder and java.net.URLDecoder with >your fields to index ? > > Best Regards from Spain ! > __________________________ > David Bonilla Fuertes > THE BIT BANG NETWORK > http://www.bit-bang.com > Profesor Waksman, 8, 6º B > 28036 Madrid > SPAIN > Tel.: (+34) 914 577 747 > Móvil: 656 62 83 92 > Fax: (+34) 914 586 176 > __________________________ -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: For additional commands, e-mail: -- To unsubscribe, e-mail: For additional commands, e-mail: