tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: A problem in the right-to-left languages
Date Tue, 01 Nov 2011 13:07:19 GMT
On Tue, Nov 1, 2011 at 8:48 AM, Robert Muir <rcmuir@gmail.com> wrote:

> I really think tika should include the parts of icu4j it depends on.
> Often open source projects are hesitant to include icu jar because of
> its size, but thats silly since the size is just a catch-all.
> We can use the webapp to make a smaller one that includes the minimum
> of stuff Tika needs. http://apps.icu-project.org/datacustom/
>
> Maybe we should open a JIRA issue to fix this? I think its a bug that
> Arabic and Persian text silently come out corrupted if you don't have
> this in your classpath.

+1

I think it's awful to just silently produce bad results.

Mike McCandless

http://blog.mikemccandless.com

Mime
View raw message