jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas <Nicolas.Modr...@macnica.com>
Subject text filters ...
Date Mon, 01 May 2006 09:19:16 GMT
Hi all,

I am trying to get jackrabbit to index pdf with Japanese content.
Jackrabbit works really well with indexing content located in nodes  
as text, even Japanese (using the CJK analyser of Lucene), and I also  
get proper results on searching english through pdf documents. (using  
the pdf text filter classes)

But I cannot get the search to return anything from pdf with japanese  
text.

So I wanted to write my own PdfFilter and use that class for  
debugging. Unfortunately, I cannot get my text filter class to be  
used. I am using the following for the SearchIndex configuration:

   <SearchIndex  
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
             <param name="textFilterClasses" value="my.own.PdfFilter" />
             <param name="path" value="${wsp.home}/index" />
<!--	    <param name="analyzer"  
value="org.apache.lucene.analysis.cjk.CJKAnalyzer"/> -->
             <param name="useCompoundFile" value="true" />
             <param name="minMergeDocs" value="100" />
             <param name="volatileIdleTime" value="3" />
             <param name="maxMergeDocs" value="100000" />
             <param name="mergeFactor" value="10" />
             <param name="bufferSize" value="10" />
    </SearchIndex>

It seems like the textFilterClasses parameter is never used.

Can anybody confirm or infirm the above ?
Or even better, if anybody has some hints or piece of advice as how  
to achieve an index of a japanese pdf.

Thank you in advance,

Nicolas Modrzyk,

Mime
View raw message