jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabiano Nunes <falha...@gmail.com>
Subject Re: Text extractors doesn't work correctly
Date Mon, 20 Jul 2009 15:16:10 GMT
Sending from correct mail.

About the sql plain text. The extractors don't use the extension, but the
mime type instead. So, you need to add sql extension as text/plain in
"org\apache\jackrabbit\server\io\mimetypes.properties".
See more in http://wiki.apache.org/jackrabbit/TextExtractorExamples

PDFBox 0.72 doesn't work properly with some pdf documents. See more in
https://issues.apache.org/jira/browse/PDFBOX-361.
So, I wrote a extractor (a copy of the original, in fact) based on trunk
version of PDFBox. Furthermore, the trunk version is faster then 0.72.

On Sun, Jul 19, 2009 at 5:35 PM, Vjger <mariner@libero.it> wrote:

>
> Hi to all.
> I'm using JackRabbit 1.5.5 and in my classpath I've
> jackrabbit-text-extractors-1.5.0-jar
>
> Well, I noticed two problems.
>
> 1) The plain text text extractors depends by the file extension: in fact,
> in
> my workspace I've two nt:file node one as .txt extension the other as .sql
> extension. The SQL contains function found only the first even if the two
> file are identical (apart of the extension).
>
> 2) The pdf extractor has not worked correctly: with two different pdf files
> it has not found the searched text
>
> Any suggests?
>
> Thanks in advance
> --
> View this message in context:
> http://www.nabble.com/Text-extractors-doesn%27t-work-correctly-tp24560696p24560696.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message