jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: HTML Text Extraction fails even with Jackrabbit 2.1
Date Fri, 30 Apr 2010 09:20:44 GMT

On Thu, Apr 29, 2010 at 5:26 PM, Jawad Bokhari <jawad.bokhari@gmail.com> wrote:
> Caused by: java.nio.charset.IllegalCharsetNameException:

It looks like the HTML documents you have use some character encoding
that's not supported by the underlying Java platform.

Can you file a bug about this in
https://issues.apache.org/jira/browse/TIKA for the Tika project that
Jackrabbit nowadays uses for full text extraction? It would be great
if you could also attach a troublesome HTML file to the bug report.


Jukka Zitting

View raw message