jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paco Avila <monk...@gmail.com>
Subject Re: detect a failed text extraction?
Date Wed, 25 Nov 2009 17:00:04 GMT
On Wed, Nov 25, 2009 at 2:26 PM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
> On Tue, Nov 24, 2009 at 8:53 PM, Paco Avila <monkiki@gmail.com> wrote:
>> There is any way to detect a failed text extraction ? I know, I can
>> see the log but the failure it not associated to a file or path.
>> [...]
>> I have posted this question in the user list, but I think it is
>> interesting talking about how it can be achieved.
> Could we solve this by improving the level of logging in the indexer?
> Alternatively, if you don't have easy access to the log files, we
> could possibly inject some special unique term to the index as a
> marker of failed text extraction. That way you could query for all
> nodes for which text extraction failed.

Increasing the log level can be a goog approach: the objective is link
a failed text extraction with a node path. This way, I can see if the
submitted document has failed in the text extraction process. The
other approach (injecting a special term) also is very cute because I
can get a list of failed indexed document from a XPath query. Both
solutions can be combined to improve the jackrabbit experience: the
XPath query give a list of unindexed document and the log can hep to
know what failed in the text extraction.

> Finally, as a debugging tool we could add a feature to the Jackrabbit
> webapp that allows you to download the extracted text content of a
> binary instead of the binary itself. We'd simply run a new text
> extraction pass on the stored binary and return the extracted text or
> any encountered errors to he client.

This also can be interesting.

> BR,
> Jukka Zitting

Paco Avila

View raw message