jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bokhari <jawad.bokh...@gmail.com>
Subject LazyTextExtractorField while parsing HTML or JSP files
Date Wed, 07 Apr 2010 19:24:31 GMT

HI, 

I am trying to add HTML documents to jackrabbit, but I get this error below.
The document is added to the repository, but actually isn't indexed. 
When I look at the Tika supported formats, I can see that HTML is a
supported format. 
http://lucene.apache.org/tika/0.7/formats.html
http://lucene.apache.org/tika/0.7/formats.html 

Thanks,

Bokhari


07.04.2010 22:22:52 *WARN * LazyTextExtractorField: Failed to extract text
from a binary property (LazyTextExtractorField.java, line 165)
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.html.HtmlParser@b6f7f5
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
        at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at
org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
        at
org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:195)
        at
org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.nio.charset.IllegalCharsetNameException:
        at java.nio.charset.Charset.checkName(Charset.java:273)
        at java.nio.charset.Charset.lookup2(Charset.java:458)
        at java.nio.charset.Charset.lookup(Charset.java:437)
        at java.nio.charset.Charset.isSupported(Charset.java:479)
        at org.apache.tika.parser.html.HtmlParser.parse(HtmlParser.java:49)
        at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
        ... 12 more

-- 
View this message in context: http://n4.nabble.com/LazyTextExtractorField-while-parsing-HTML-or-JSP-files-tp1754848p1754848.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Mime
View raw message