jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JOSE FELIX HERNANDEZ BARRIO <jose.hernan...@isthari.com>
Subject Re: Problem con textExtractor
Date Wed, 28 Apr 2010 09:23:41 GMT
is there any limitation on the size of the pdf the extractor can manage ?

we're working with files around 16mb in size.




2010/4/28 JOSE FELIX HERNANDEZ BARRIO <jose.hernandez@isthari.com>

> I don't want to index the content of the pdf for full text search,
> can i disable it using the configuration below?
>
>  <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>
>             <param name="path" value="${wsp.home}/index"/>
>
>             <param name="textFilterClasses" value="*org.apache.jackrabbit.extractor.PlainTextExtractor*"/>
>
>             <param name="extractorPoolSize " value="2"/>
>
>             <param name="supportHighlighting" value="true"/>
>
>         </SearchIndex>
>
>
>
> 2010/4/28 Jukka Zitting <jukka.zitting@gmail.com>
>
> Hi,
>>
>> On Wed, Apr 28, 2010 at 10:50 AM, JOSE FELIX HERNANDEZ BARRIO
>> <jose.hernandez@isthari.com> wrote:
>> > I'm inserting pdf in the repository and get the exception:
>> >
>> > 2010-04-28 10:25:39,763 WARN [PDFStreamEngine.java] [processOperator] *
>> > java.io.IOException*: Mapping code should be 1 or two bytes and not 4
>> >      at org.apache.fontbox.cmap.CMap.addMapping(*CMap.java:122*)
>>
>> The underlying PDFBox library is having trouble with your PDF file,
>> which results in a warning being logged. This is not too serious, the
>> only downside is that this PDF might not show up in full text
>> searches.
>>
>> You may want to report this to users@pdfbox.apache.org or to the
>> PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.
>>
>> BR,
>>
>> Jukka Zitting
>>
>
>
>
> --
> Jose Hernandez
> 675599600
> Isthari
> http://www.isthari.com
>



-- 
Jose Hernandez
675599600
Isthari
http://www.isthari.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message