jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Brosius <dbros...@mebigfatguy.com>
Subject Re: How can I access to the TextExtractor result?
Date Wed, 25 Nov 2009 01:34:23 GMT
I'm assuming that this is only safe when the repository is not open thru 
jackrabbit, otherwise concurrent havoc will insue.

S├ębastien Launay wrote:
> Hi Paco,
>
> If you are not afraid to get their hands dirty you can use Luke [1]
> and analyze the indexes found in repository/workspaces/*/index.
> You might want to search the field named '_:FULLTEXT' (told you it
> will get dirty ;)).
>
> [1] http://code.google.com/p/luke/
>
> 2009/11/24 Paco Avila <monkiki@gmail.com>:
>   
>> Thanks, this is the expected answer :(
>>
>> Anyway, there is any way to detect a failed text extraction ? I know,
>> I can see the log but the failure it not associated to a file or path.
>>
>> Some times when I upload a document (word, pdf, etc.) to my DMS build
>> on Jackrabbit, it is not indexed. Office documents seems to be
>> specially problematic due to its propietary format. And the problem is
>> that I don't know which document had problems it their text
>> extraction, specially if use extractorPoolSize > 1.
>>
>> Perhaps this question should be send to the development list? I thinks
>> this can be a very useful improvement to Jackrabbit.
>>     
>
>   


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message