lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <>
Subject Re: char mapping in lucene-icu
Date Sat, 15 Feb 2014 01:13:08 GMT
Do you get the exception if you run ant before changing the data files?

"Header authentication failed, please check if you have a valid ICU data 

Check with the ICU project as to the proper format for THEIR files. I mean, 
this doesn't sound like a Lucene issue.

Maybe it could be as simple as whether the data file should have DOS or UNIX 
or Mac line endings (CRLF vs. NL vs. CR.) Be sure to use an editor that 
satisfies the requirements of ICU.

To be clear, Lucene itself does not have a published API for modifying the 
mappings of ICU.

-- Jack Krupansky

-----Original Message----- 
Sent: Friday, February 14, 2014 7:48 PM
Subject: char mapping in lucene-icu


I try to use lucene-icu li in solr-4.6.1. I need to  change a char mapping 
in lucene-icu. I have made changes


and built jar file using ant , but it did not help.

I took a look to  lucene/analysis/icu/build.xml and see these lines

<property name="gennorm2.src.files"
  value="nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt DiacriticFolding.txt 
DingbatFolding.txt HanRadicalFolding.txt NativeDigitFolding.txt"/>
  <property name="gennorm2.tmp" value="${build.dir}/gennorm2/utr30.tmp"/>
  <property name="gennorm2.dst" 
  <target name="gennorm2" depends="gen-utr30-data-files">
    <echo>Note that the gennorm2 and icupkg tools must be on your PATH. 
These tools
are part of the ICU4C package. See </echo>
    <mkdir dir="${build.dir}/gennorm2"/>
    <exec executable="gennorm2" failonerror="true">
      <arg value="-v"/>
      <arg value="-s"/>
      <arg value="${}"/>
      <arg line="${gennorm2.src.files}"/>
      <arg value="-o"/>
      <arg value="${gennorm2.tmp}"/>
    <!-- now convert binary file to big-endian -->
    <exec executable="icupkg" failonerror="true">
      <arg value="-tb"/>
      <arg value="${gennorm2.tmp}"/>
      <arg value="${gennorm2.dst}"/>
    <delete file="${gennorm2.tmp}"/>

looks like ant does not execute gennorm2. If I build utr30.nrm file using 
gennorm2 manually
and replacing utr30.nrm in the jar file then starting solr gives the 
following error.
Caused by: java.lang.RuntimeException: ICU data file 
error: Header authentication failed, please check if you have a valid ICU 
data file

My questions are;
1. if the above code in the build file does not get executed then how the 
utr30 file is generated?
2. How to change a character mapping.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message