jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From go canal <goca...@yahoo.com>
Subject Re: full text search for CJK languages
Date Sun, 09 Aug 2009 14:38:22 GMT
Just tested:
  the default configuration supports full CJK text search for Text, Word and PPT file; but
can not search PDF/Excel files.

 rgds,
canal




________________________________
From: go canal <gocanal@yahoo.com>
To: users@jackrabbit.apache.org
Sent: Sunday, August 9, 2009 10:20:28 PM
Subject: full text search for CJK languages

Hi,
could not find detailed info wrt supporting full text search for 2-byte languages like CJK
(Chinese, Japanese and Korea). 

1) anybody know if there is one such library available ? and
2) how to config this in Jackrabbit ? Should I replace all the extractors in the current configuration:
    <SearchIndex .....
      <param name="textFilterClasses" 

        value="org.apache.jackrabbit.extractor.PlainTextExtractor,
         org.apache.jackrabbit.extractor.MsWordTextExtractor,
   org.apache.jackrabbit.extractor.MsExcelTextExtractor,
   org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
   org.apache.jackrabbit.extractor.PdfTextExtractor,
   org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
   org.apache.jackrabbit.extractor.RTFTextExtractor,
   org.apache.jackrabbit.extractor.HTMLTextExtractor,
   org.apache.jackrabbit.extractor.XMLTextExtractor" />
    </SearchIndex>
rgds,
canal


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message