Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 75799 invoked from network); 25 Nov 2009 13:27:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Nov 2009 13:27:30 -0000 Received: (qmail 59641 invoked by uid 500); 25 Nov 2009 13:27:30 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 59557 invoked by uid 500); 25 Nov 2009 13:27:29 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 59549 invoked by uid 99); 25 Nov 2009 13:27:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Nov 2009 13:27:29 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jukka.zitting@gmail.com designates 209.85.218.214 as permitted sender) Received: from [209.85.218.214] (HELO mail-bw0-f214.google.com) (209.85.218.214) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Nov 2009 13:27:21 +0000 Received: by bwz6 with SMTP id 6so7108002bwz.11 for ; Wed, 25 Nov 2009 05:27:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=yUanYs1q69ZC7Vhsr33/bElDhd8Dg37F5359r57VZMA=; b=D5A2MYc4rc4QLWiJEGsSSQDLQDjejvshKTYjv4uTYiLdhzlXf2y23HnVHLosk4GmDP +nlM+5fd9BsBQ+BK1WHDqWpa57r7YgSf+odAWGILWEXg2Ias/mpxGUOqpz9GSDNqmFhf mElXPdKNf8Cov/NnQCThT2wYmDn+McpcQ543k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=xW4X9nTlyZ/VBnTHVi6oJxZfoiUyUYp7l1xpQB5h7ywMxIwhl8dWnwcBwAvEWBKZEG QZ/lOst1X5WgGGPbp9+2OaB26h4qJdNFf14fXby5/AUuAqZllfr0j3Md07hEQmZ8fU6+ Fo+CphCiqsharpStW2hVkzwaULxQA9zHSXivc= MIME-Version: 1.0 Received: by 10.204.29.11 with SMTP id o11mr1683032bkc.164.1259155621440; Wed, 25 Nov 2009 05:27:01 -0800 (PST) In-Reply-To: <8f70390911241153l56758814pd0023bf5e4dba738@mail.gmail.com> References: <8f70390911241153l56758814pd0023bf5e4dba738@mail.gmail.com> From: Jukka Zitting Date: Wed, 25 Nov 2009 14:26:41 +0100 Message-ID: <510143ac0911250526t4cf0d440t2557850d2badd067@mail.gmail.com> Subject: Re: detect a failed text extraction? To: dev@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, On Tue, Nov 24, 2009 at 8:53 PM, Paco Avila wrote: > There is any way to detect a failed text extraction ? I know, I can > see the log but the failure it not associated to a file or path. > [...] > I have posted this question in the user list, but I think it is > interesting talking about how it can be achieved. Could we solve this by improving the level of logging in the indexer? Alternatively, if you don't have easy access to the log files, we could possibly inject some special unique term to the index as a marker of failed text extraction. That way you could query for all nodes for which text extraction failed. Finally, as a debugging tool we could add a feature to the Jackrabbit webapp that allows you to download the extracted text content of a binary instead of the binary itself. We'd simply run a new text extraction pass on the stored binary and return the extracted text or any encountered errors to he client. BR, Jukka Zitting