jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jawad Bokhari <jawad.bokh...@gmail.com>
Subject Re: HTML Text Extraction fails even with Jackrabbit 2.1
Date Fri, 30 Apr 2010 12:30:36 GMT
Thanks Jukka,

It seems as if there was something wrong with my code, which I actually
couldn't identify even now.
Anyways, I just loaded Jackrabbit 2.1 jar files and imported the code of
populate.jsp in my own class for importing a file and it worked perfectly.
I tried many different HTML files, and didn't notice any exception and
searching is also working fine.

There must be something wrong with my code. Next time, for any issues, I'll
copy the relevant lines of code also for reference.

Thanks for your help anyway,

Bokhari

On Fri, Apr 30, 2010 at 2:20 PM, Jukka Zitting <jukka.zitting@gmail.com>wrote:

> Hi,
>
> On Thu, Apr 29, 2010 at 5:26 PM, Jawad Bokhari <jawad.bokhari@gmail.com>
> wrote:
> > Caused by: java.nio.charset.IllegalCharsetNameException:
>
> It looks like the HTML documents you have use some character encoding
> that's not supported by the underlying Java platform.
>
> Can you file a bug about this in
> https://issues.apache.org/jira/browse/TIKA for the Tika project that
> Jackrabbit nowadays uses for full text extraction? It would be great
> if you could also attach a troublesome HTML file to the bug report.
>
> BR,
>
> Jukka Zitting
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message