Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 49413 invoked from network); 19 Jul 2009 21:21:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Jul 2009 21:21:02 -0000 Received: (qmail 37526 invoked by uid 500); 19 Jul 2009 21:22:07 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 37463 invoked by uid 500); 19 Jul 2009 21:22:07 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 37452 invoked by uid 99); 19 Jul 2009 21:22:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Jul 2009 21:22:06 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of falha404@gmail.com designates 209.85.210.180 as permitted sender) Received: from [209.85.210.180] (HELO mail-yx0-f180.google.com) (209.85.210.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Jul 2009 21:21:56 +0000 Received: by yxe10 with SMTP id 10so2887235yxe.15 for ; Sun, 19 Jul 2009 14:21:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:from:date:x-google-sender-auth:message-id:subject:to :content-type; bh=tTqD9sIA663WQ5SLyl8PZutI1oTH/AB5Os3V+IWXNas=; b=YxlfLGv3PXqoeSRVPmaSAn4wHrhO1PiyhIA+Qxa5sHiAbjQEvGjDiOSVCBp/u4QRMv 1Dwo9br4C5CW5b/Nn+uXA/MXvArnjx0P3prPWk+uqTVaJeACJ0e/YgLvhrh/vrJvz6AY YIn/p8DqvcBjdiKrIRK2/ResxPfe6cScgg/JE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type; b=IAxWcg7jx9g7OAYS/MjgFZiSN/Dk1An/d2+PSXsV6bL/h0D0Q9wGTRLsYT/EAncNMf SO7nzSnVVkc1qvZ3dJnsmVRaoqYOKSakK3rkycKtzLEafqPlMTTT4In0LCabZk6jg8QS E0NGBpg7XsC31keWEPi/Ja9gkUCofVGlVcPU0= MIME-Version: 1.0 Sender: falha404@gmail.com Received: by 10.100.11.14 with SMTP id 14mr5242645ank.81.1248038495129; Sun, 19 Jul 2009 14:21:35 -0700 (PDT) In-Reply-To: <24560696.post@talk.nabble.com> References: <24560696.post@talk.nabble.com> From: Fabiano Nunes Date: Sun, 19 Jul 2009 18:21:15 -0300 X-Google-Sender-Auth: a19b97510c9218b9 Message-ID: Subject: Re: Text extractors doesn't work correctly To: users@jackrabbit.apache.org Content-Type: multipart/alternative; boundary=0016e642dd642b11bf046f159bea X-Virus-Checked: Checked by ClamAV on apache.org --0016e642dd642b11bf046f159bea Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit PDFBox 0.72 doesn't work properly with some pdf documents. See more in https://issues.apache.org/jira/browse/PDFBOX-361. So, I wrote a extractor (a copy of the original, in fact) based on trunk version of PDFBox. Furthermore, the trunk version is faster then 0.72. On Sun, Jul 19, 2009 at 5:35 PM, Vjger wrote: > > Hi to all. > I'm using JackRabbit 1.5.5 and in my classpath I've > jackrabbit-text-extractors-1.5.0-jar > > Well, I noticed two problems. > > 1) The plain text text extractors depends by the file extension: in fact, > in > my workspace I've two nt:file node one as .txt extension the other as .sql > extension. The SQL contains function found only the first even if the two > file are identical (apart of the extension). > > 2) The pdf extractor has not worked correctly: with two different pdf files > it has not found the searched text > > Any suggests? > > Thanks in advance > -- > View this message in context: > http://www.nabble.com/Text-extractors-doesn%27t-work-correctly-tp24560696p24560696.html > Sent from the Jackrabbit - Users mailing list archive at Nabble.com. > > --0016e642dd642b11bf046f159bea--