jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From taha ben salah <taha.bensa...@gmail.com>
Subject How to know if a document is well indexed or not
Date Thu, 22 Jul 2010 11:13:03 GMT
Hi,
I found that some documents failed to be indexed in lucene.
Particularly some Office 2003 documents failed to be parsed (office tika
parser)
You can find out the stacktrace at  the end of this submission.
I wonder if there is a way to catch that exception  (indexing is done in
astynchronous thread and error is thrown to log only).
It will be even better if we could know (using some public API) the indexing
status of documents (indexed/not yet/failded index).
Any suggestion is very welcome.
Thanks in advance.
Taha



org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@ced1ac
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at
org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
        at
org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:195)
        at
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.poi.hpsf.HPSFRuntimeException: Value type of property
ID 1 is not VT_I2 but 2048.
        at org.apache.poi.hpsf.Section.<init>(Section.java:262)
        at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:452)
        at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:247)
        at
org.apache.tika.parser.microsoft.OfficeParser.parseSummaryEntryIfExists(OfficeParser.java:148)
        at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:71)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message