lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominator" <galf...@x-cago.com>
Subject Re: How to include strange characters??
Date Mon, 14 Oct 2002 09:07:28 GMT
I solved my display, by changing the encoding of the browser to unicode

"Chris Davis" <unngh@hotmail.com> wrote in message
news:OE58uiX38V1nis2546L0000a5f6@hotmail.com...
To Dominator,

Where you able to solve the display problem as well?  I am having a similiar
problem with documents that contain the " (open double quote &#8220).  I am
not concerned with searching on the character, but when I attempt to dsiplay
a stored field with this character, it does not display correctly.  Even
stranger, the closing quote &#8221 does display.

To All,

I have browsed through the majority of messages related to Unicode in the
archive, and my reading tells me that Lucene does not normally change the
data that is "stored" for a field.  Can someone give me some pointers on how
to troubleshoot this problem.

Note:  I am indexing data that is being pulled from a SQL Server 2000 DB on
Windows 2000.

-------------------


In an earlier message Dominator wrote:

> I print out a result string it shows a very strange result, for example
> search for: "civilingeni&rcaron;r" string: "civilingeni&Abreve;r".. I'm
sure it's an
> unicode problem, but where can I change it??



Dominator wrote:

thx, with your help I could solve the problem

"karl ie" <karl@gan.no> wrote in message
news:1B2FFF3F-D9E8-11D6-9D40-000393C0F638@gan.no...
i had such problems with norwegian characters and it resolved into
making sure the querystring has the same encoding as the index has.

since this is again a java.lang.String encoding question i had these
problems with querystrings coming from java Servlets and CLI. For both
the quickfix was to re-encode the query in UTF-8/16:

String querystring = argv[0]; ' String querystring =
httprequest.getParameter("query");
querystring = new String(querystring.getBytes("UTF-8"));
...

this fixed my norwegian/samii problems...


mvh karl ie

On mandag, okt 7, 2002, at 13:04 Europe/Oslo, Dominator wrote:

>> I use czech language with more bizzare characters and there is no
>> problem at all. Are you sure, that your XML contains character set
>> information?
>
> yes, I tried <?xml version="1.0" encoding="ISO-8859-2"?> and <?xml
> version="1.0" encoding="UTF-8"?> but I get the same strange characters.
>
>
>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>






--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message