lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alx...@aim.com
Subject char mapping in lucene-icu
Date Sat, 15 Feb 2014 00:48:26 GMT

Hello,

I try to use lucene-icu li in solr-4.6.1. I need to  change a char mapping in lucene-icu.
I have made changes
to 

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

and built jar file using ant , but it did not help.

 I took a look to  lucene/analysis/icu/build.xml and see these lines

 <property name="gennorm2.src.files"
  	value="nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt DiacriticFolding.txt DingbatFolding.txt
HanRadicalFolding.txt NativeDigitFolding.txt"/>
  <property name="gennorm2.tmp" value="${build.dir}/gennorm2/utr30.tmp"/>
  <property name="gennorm2.dst" value="${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm"/>
  <target name="gennorm2" depends="gen-utr30-data-files">
    <echo>Note that the gennorm2 and icupkg tools must be on your PATH. These tools
are part of the ICU4C package. See http://site.icu-project.org/ </echo>
    <mkdir dir="${build.dir}/gennorm2"/>
    <exec executable="gennorm2" failonerror="true">
      <arg value="-v"/>
      <arg value="-s"/>
      <arg value="${utr30.data.dir}"/>
      <arg line="${gennorm2.src.files}"/>
      <arg value="-o"/>
      <arg value="${gennorm2.tmp}"/>
    </exec>
    <!-- now convert binary file to big-endian -->
    <exec executable="icupkg" failonerror="true">
      <arg value="-tb"/>
      <arg value="${gennorm2.tmp}"/>
      <arg value="${gennorm2.dst}"/>
    </exec>
    <delete file="${gennorm2.tmp}"/>
  </target>

looks like ant does not execute gennorm2. If I build utr30.nrm file using gennorm2 manually
 and replacing utr30.nrm in the jar file then starting solr gives the following error.
Caused by: java.lang.RuntimeException: java.io.IOException: ICU data file error: Header authentication
failed, please check if you have a valid ICU data file

My questions are;
 1. if the above code in the build file does not get executed then how the utr30 file is generated?
 2. How to change a character mapping. 


Thanks.
Alex.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message