lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sar...@apache.org
Subject svn commit: r1234867 - /lucene/dev/trunk/solr/CHANGES.txt
Date Mon, 23 Jan 2012 15:56:07 GMT
Author: sarowe
Date: Mon Jan 23 15:56:06 2012
New Revision: 1234867

URL: http://svn.apache.org/viewvc?rev=1234867&view=rev
Log:
LUCENE-3690: Added info about changes in HTMLStripCharFilter surrogate handling to solr/CHANGES.txt.

Modified:
    lucene/dev/trunk/solr/CHANGES.txt

Modified: lucene/dev/trunk/solr/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/CHANGES.txt?rev=1234867&r1=1234866&r2=1234867&view=diff
==============================================================================
--- lucene/dev/trunk/solr/CHANGES.txt (original)
+++ lucene/dev/trunk/solr/CHANGES.txt Mon Jan 23 15:56:06 2012
@@ -513,6 +513,11 @@ Bug Fixes
     from Unicode character classes [:ID_Start:] and [:ID_Continue:].
   - Uppercase character entities """, "©", ">", "<", "®",
     and "&" are now recognized and handled as if they were in lowercase.
+  - The REPLACEMENT CHARACTER U+FFFD is now used to replace numeric character 
+    entities for unpaired UTF-16 low and high surrogates (in the range
+    [U+D800-U+DFFF]).
+  - Properly paired numeric character entities for UTF-16 surrogates are now
+    converted to the corresponding code units.
   - Opening tags with unbalanced quotation marks are now properly stripped.
   - Literal "<" and ">" characters in opening tags, regardless of whether they
     appear inside quotation marks, now inhibit recognition (and stripping) of



Mime
View raw message