jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JOSE FELIX HERNANDEZ BARRIO <jose.hernan...@isthari.com>
Subject Re: Problem con textExtractor
Date Wed, 28 Apr 2010 09:19:24 GMT
I don't want to index the content of the pdf for full text search,
can i disable it using the configuration below?

 <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="textFilterClasses"
value="*org.apache.jackrabbit.extractor.PlainTextExtractor*"/>
            <param name="extractorPoolSize " value="2"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>



2010/4/28 Jukka Zitting <jukka.zitting@gmail.com>

> Hi,
>
> On Wed, Apr 28, 2010 at 10:50 AM, JOSE FELIX HERNANDEZ BARRIO
> <jose.hernandez@isthari.com> wrote:
> > I'm inserting pdf in the repository and get the exception:
> >
> > 2010-04-28 10:25:39,763 WARN [PDFStreamEngine.java] [processOperator] *
> > java.io.IOException*: Mapping code should be 1 or two bytes and not 4
> >      at org.apache.fontbox.cmap.CMap.addMapping(*CMap.java:122*)
>
> The underlying PDFBox library is having trouble with your PDF file,
> which results in a warning being logged. This is not too serious, the
> only downside is that this PDF might not show up in full text
> searches.
>
> You may want to report this to users@pdfbox.apache.org or to the
> PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.
>
> BR,
>
> Jukka Zitting
>



-- 
Jose Hernandez
675599600
Isthari
http://www.isthari.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message