jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Müller <thomas.muel...@day.com>
Subject Re: full text search for CJK languages
Date Mon, 10 Aug 2009 08:36:25 GMT
Hi,

I'm not sure, but I think you need to use

class org.apache.lucene.analysis.cjk.CJKAnalyzer

See http://wiki.apache.org/jackrabbit/Search - parameter analyzer

Can you please verify this is correct? I will then update the documentation.

Regards,
Thomas


On Sun, Aug 9, 2009 at 4:38 PM, go canal<gocanal@yahoo.com> wrote:
> Just tested:
>  the default configuration supports full CJK text search for Text, Word and PPT file;
but can not search PDF/Excel files.
>
>  rgds,
> canal
>
>
>
>
> ________________________________
> From: go canal <gocanal@yahoo.com>
> To: users@jackrabbit.apache.org
> Sent: Sunday, August 9, 2009 10:20:28 PM
> Subject: full text search for CJK languages
>
> Hi,
> could not find detailed info wrt supporting full text search for 2-byte languages like
CJK (Chinese, Japanese and Korea).
>
> 1) anybody know if there is one such library available ? and
> 2) how to config this in Jackrabbit ? Should I replace all the extractors in the current
configuration:
>    <SearchIndex .....
>      <param name="textFilterClasses"
>
>        value="org.apache.jackrabbit.extractor.PlainTextExtractor,
>         org.apache.jackrabbit.extractor.MsWordTextExtractor,
>   org.apache.jackrabbit.extractor.MsExcelTextExtractor,
>   org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
>   org.apache.jackrabbit.extractor.PdfTextExtractor,
>   org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
>   org.apache.jackrabbit.extractor.RTFTextExtractor,
>   org.apache.jackrabbit.extractor.HTMLTextExtractor,
>   org.apache.jackrabbit.extractor.XMLTextExtractor" />
>    </SearchIndex>
> rgds,
> canal
>
>
>

Mime
View raw message