lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Davis" <un...@hotmail.com>
Subject Re: How to include strange characters??
Date Sun, 13 Oct 2002 12:15:52 GMT
To Dominator,

Where you able to solve the display problem as well?  I am having a similiar problem with
documents that contain the " (open double quote &#8220).  I am not concerned with searching
on the character, but when I attempt to dsiplay a stored field with this character, it does
not display correctly.  Even stranger, the closing quote &#8221 does display.

To All,

I have browsed through the majority of messages related to Unicode in the archive, and my
reading tells me that Lucene does not normally change the data that is "stored" for a field.
 Can someone give me some pointers on how to troubleshoot this problem.

Note:  I am indexing data that is being pulled from a SQL Server 2000 DB on Windows 2000.

-------------------


In an earlier message Dominator wrote:

> I print out a result string it shows a very strange result, for example
> search for: "civilingeni&rcaron;r" string: "civilingeni&Abreve;¸r".. I'm sure
it's an
> unicode problem, but where can I change it??



Dominator wrote:

thx, with your help I could solve the problem

"karl ie" <karl@gan.no> wrote in message
news:1B2FFF3F-D9E8-11D6-9D40-000393C0F638@gan.no...
i had such problems with norwegian characters and it resolved into
making sure the querystring has the same encoding as the index has.

since this is again a java.lang.String encoding question i had these
problems with querystrings coming from java Servlets and CLI. For both
the quickfix was to re-encode the query in UTF-8/16:

String querystring = argv[0]; ' String querystring =
httprequest.getParameter("query");
querystring = new String(querystring.getBytes("UTF-8"));
...

this fixed my norwegian/samii problems...


mvh karl ie

On mandag, okt 7, 2002, at 13:04 Europe/Oslo, Dominator wrote:

>> I use czech language with more bizzare characters and there is no
>> problem at all. Are you sure, that your XML contains character set
>> information?
>
> yes, I tried <?xml version="1.0" encoding="ISO-8859-2"?> and <?xml
> version="1.0" encoding="UTF-8"?> but I get the same strange characters.
>
>
>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message