jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paco Avila <monk...@gmail.com>
Subject Re: How can I access to the TextExtractor result?
Date Tue, 24 Nov 2009 19:51:11 GMT
Very interesting :)

On Tue, Nov 24, 2009 at 8:04 PM, S├ębastien Launay
<sebastienlaunay@gmail.com> wrote:
> Hi Paco,
>
> If you are not afraid to get their hands dirty you can use Luke [1]
> and analyze the indexes found in repository/workspaces/*/index.
> You might want to search the field named '_:FULLTEXT' (told you it
> will get dirty ;)).
>
> [1] http://code.google.com/p/luke/
>
> 2009/11/24 Paco Avila <monkiki@gmail.com>:
>> Thanks, this is the expected answer :(
>>
>> Anyway, there is any way to detect a failed text extraction ? I know,
>> I can see the log but the failure it not associated to a file or path.
>>
>> Some times when I upload a document (word, pdf, etc.) to my DMS build
>> on Jackrabbit, it is not indexed. Office documents seems to be
>> specially problematic due to its propietary format. And the problem is
>> that I don't know which document had problems it their text
>> extraction, specially if use extractorPoolSize > 1.
>>
>> Perhaps this question should be send to the development list? I thinks
>> this can be a very useful improvement to Jackrabbit.
>
> --
> S├ębastien Launay
>



-- 
Paco Avila
OpenKM
http://www.openkm.com
http://www.guia-ubuntu.org

Mime
View raw message