lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daan Hoogland <daan.hoogl...@asml.com>
Subject Re: indexing numeric entities?
Date Thu, 07 Oct 2004 11:05:01 GMT
maybe inline?

<html xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <head>
  <title>japan</title>
 </head>
 <body bgcolor="#FFFFFF" alink="black">
  <p>

&#12501;&#12451;&#12540;&#12523;&#12489;&#12469;&#12540;&#12499;&#12473;&#12456;&#12531;&#12472;&#12491;&#12450;

  </p>

</html>

Indexing the above document using the HTMLParser demo and the 
CJKAnalyzer, only the term "japan" is found in the content. This is not 
correct, is it?
Should I convert the entities by hand?


Sorry for the mess I send before.


-- 
The information contained in this communication and any attachments is confidential and may
be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review,
use, disclosure or distribution is prohibited. If you are not the intended recipient, please
notify the sender immediately by replying to this message and destroy all copies of this message
and any attachments. ASML is neither liable for the proper and complete transmission of the
information contained in this communication, nor for any delay in its receipt.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message