jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paco Avila <monk...@gmail.com>
Subject Re: How to know if a document is well indexed or not
Date Fri, 23 Jul 2010 10:57:48 GMT
I'm also interested in a solution. Probably a source code modification
is needed, so I have to dig in the source code the find a reasonable
solution. The main problem here is that the text extractor does not
know the source file name, to be used in a possible
text_extraction_error.log file :(

On Thu, Jul 22, 2010 at 1:13 PM, taha ben salah <taha.bensalah@gmail.com> wrote:
> Hi,
> I found that some documents failed to be indexed in lucene.
> Particularly some Office 2003 documents failed to be parsed (office tika
> parser)
> You can find out the stacktrace at  the end of this submission.
> I wonder if there is a way to catch that exception  (indexing is done in
> astynchronous thread and error is thrown to log only).
> It will be even better if we could know (using some public API) the indexing
> status of documents (indexed/not yet/failded index).
> Any suggestion is very welcome.
> Thanks in advance.
> Taha
>
>
>
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@ced1ac
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
>        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
>        at
> org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
>        at
> org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:195)
>        at
> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
>        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.poi.hpsf.HPSFRuntimeException: Value type of property
> ID 1 is not VT_I2 but 2048.
>        at org.apache.poi.hpsf.Section.<init>(Section.java:262)
>        at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:452)
>        at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:247)
>        at
> org.apache.tika.parser.microsoft.OfficeParser.parseSummaryEntryIfExists(OfficeParser.java:148)
>        at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:71)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>



-- 
OpenKM
http://www.openkm.com
http://www.guia-ubuntu.org

Mime
View raw message