lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stone, Timothy" <tst...@cityofhbg.com>
Subject RE: How to handle umlauts ?
Date Thu, 19 Sep 2002 13:41:31 GMT
Ian,

I had a similar problem. I over analyzed the problem to begin with, blaming
the HTML parser among other things. But I have been humbled.

The caveat I have found in the demo is, depending on how you are rendering
your HTML, is that the IndexHTML indexes the HTML entities, but fails to
regurgitate them correctly, displaying "?"s. This is easily fixed in the
demo.

In $CATALINA_HOME/webapps/luceneweb/results.jsp, find the code that extracts
the title or summary:

For instance,

String doctitle = doc.get("title");

To correctly get the Entity to display... change this to:

String doctitle = Entities.encode( doc.get( "title" ) );

The static method encode( String s ) is defined in the Entities object. Use
this as needed in the JSP for your particular output.

HTH,
Tim

> -----Original Message-----
> From: Ian Parkin [mailto:iaparkin@hotmail.com]
> Sent: Wednesday, September 18, 2002 16:56
> To: lucene-user@jakarta.apache.org
> Subject: How to handle umlauts ?
> 
> 
> Hello all,
> 
> I suspect my answer will involve unicode, but I'd like to 
> make sure that I 
> am going down the right path here.
> 
> I have 100,000+ small HTML files that are mainly in the 
> english language. I 
> just noticed that we have some user names with umlauts. These 
> are seemingly 
> stored and searchable as the '?' character.
> 
> My code is based on the demo code that is provided with 
> Lucene, under the 
> 'demo' directory.
> 
> I am wondering what changes I will need to make to handle 
> such characters as 
> umlauts within english text ?
> 
> Thanks
> 
> IAP
> 
> _________________________________________________________________
> Join the world's largest e-mail service with MSN Hotmail. 
> http://www.hotmail.com
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:lucene-user-help@jakarta.apache.org>
> 

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message