lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fred Toth <ft...@synernet.com>
Subject Re: single quote unicode character
Date Tue, 12 Oct 2004 01:54:32 GMT
Hi Chris,

Getting unicode to pass cleanly through to the browser can
be tricky. A couple of questions:

What do you see instead of the incorrect character?

What happens when you tell your browser to display UTF-8?
(In IE, it's in View->Encoding->Unicode) Does your character
display properly?

Are your pages tagged with charset=utf-8? If not, the browser
will default to non-unicode.

Are you using any view technology? Velocity, for example, has
configuration settings for UTF-8 that control both how its templates
are read from disk and how it encodes final output.

Hope this helps. If not, let us know more about how you're getting
from the index to the page.

Thanks,

Fred

At 09:40 PM 10/11/2004, you wrote:
>I was unaware of luke... a very nice tool.
>It would appear that the characters are correct in the index, but
>somewhere between me receiving the fields on the search and displaying
>them, something is lost... do i need to do any utf-8 encoding/decoding
>when extracting the fields?
>
>
>On Mon, 11 Oct 2004 21:12:13 -0400, Erik Hatcher
><erik@ehatchersolutions.com> wrote:
> > Chris - I suspect something else in your application is getting in the
> > way.  Try to simplify and eliminate the servlet, or use a tool like
> > Luke to see what is truly in the index and what truly is being
> > returned.  Lucene indexes what you tell it (perhaps your analyzer is
> > manipulating things?), and returns what is stored exactly, so I doubt
> > Lucene is the culprit.
> >
> >        Erik
> >
> >
> >
> >
> > On Oct 11, 2004, at 8:50 PM, Chris Fraschetti wrote:
> >
> > > The dataset that I index is pretty dynamic and flexible, and I started
> > > to notice a incorrectly displayed character on some of my results...
> > > some debugging showed that it was a the Unicode character for single
> > > quote which is 8217 decimal. As far as I know, everything is fine
> > > before I index, but when retrieving the content, I receive a character
> > > that cannot be displayed on the java servlet I use to display them.
> > > How can I make lucene be vary general and accept and return all
> > > encoded/non-encoded chars are they were in their original state?
> > >
> > >
> > > --
> > > ___________________________________________________
> > > Chris Fraschetti
> > > e fraschetti@gmail.com
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
>
>--
>___________________________________________________
>Chris Fraschetti, Student CompSci System Admin
>University of San Francisco
>e fraschetti@gmail.com | http://meteora.cs.usfca.edu
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message