jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S├ębastien Launay <sebastienlau...@gmail.com>
Subject Re: How can I access to the TextExtractor result?
Date Tue, 24 Nov 2009 19:04:24 GMT
Hi Paco,

If you are not afraid to get their hands dirty you can use Luke [1]
and analyze the indexes found in repository/workspaces/*/index.
You might want to search the field named '_:FULLTEXT' (told you it
will get dirty ;)).

[1] http://code.google.com/p/luke/

2009/11/24 Paco Avila <monkiki@gmail.com>:
> Thanks, this is the expected answer :(
> Anyway, there is any way to detect a failed text extraction ? I know,
> I can see the log but the failure it not associated to a file or path.
> Some times when I upload a document (word, pdf, etc.) to my DMS build
> on Jackrabbit, it is not indexed. Office documents seems to be
> specially problematic due to its propietary format. And the problem is
> that I don't know which document had problems it their text
> extraction, specially if use extractorPoolSize > 1.
> Perhaps this question should be send to the development list? I thinks
> this can be a very useful improvement to Jackrabbit.

S├ębastien Launay

View raw message