lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: encoding problem when retrieving document field value
Date Mon, 03 Mar 2014 17:33:28 GMT
Hi G. Long,

Most likely, the problem is in your application. Lucene does not change the value stored in
the index. For stored fields, Lucene does not deal with entities, it's just binary data to
Lucene. From your application perspective, it is String in -> String out. I think maybe
you strip the entities when you output the data to the user?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: G.Long [mailto:jdevgl@gmail.com]
> Sent: Monday, March 03, 2014 6:09 PM
> To: java-user@lucene.apache.org
> Subject: encoding problem when retrieving document field value
> 
> Hi :)
> 
> My index (Lucene 3.5) contains a field called title. Its value is indexed
> (analyzed and stored) with the WhitespaceAnalyzer and can contains html
> entities such as &#146; or &#176;
> 
> My problem is that when i retrieve values from this field, some of the html
> entities are missing.
> For example :
> 
> Luke tells me that the stored value is : "l&#146;application n&#176; 90-1258"
> and when I retrieve the field value in my application, I get "l’application n°
> 90-1258".
> 
> The apostrophe is not in the returned value whereas the ° character is
> present.
> 
> What could be the problem?
> 
> Thanks,
> 
> Gary
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message