tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmad Ajiloo <ahmad.aji...@gmail.com>
Subject Re: A problem in the right-to-left languages
Date Sun, 06 Nov 2011 17:00:19 GMT
Hi
Did your probe conclude a result?

On Wed, Nov 2, 2011 at 4:40 AM, Ken Krugler <kkrugler_lists@transpac.com>wrote:

> I know some of the original team members - I could ask.
>
> Are there specific questions, or just "is anybody still minding the fire"?
>
> -- Ken
>
> On Nov 1, 2011, at 2:43pm, Nick Burch wrote:
>
> > On Tue, 1 Nov 2011, Robert Muir wrote:
> >> Well as an alternative for them committing the ebcdic detection,
> perhaps we could look at the Charset detection apis and propose some API
> additions so that users (like Tika) can plug in custom detectors?
> >
> > In theory it should be pluggable, but I seem to recal we needed to tweak
> a few core bits to get the detector working (around negative matches for
> control characters)
> >
> > Looking at the svn version history, the ICU4J team don't appear to have
> done any work on their character detectors in several years. From the lack
> of responses when I asked on their list about extending them, I fear there
> may not be anyone left in their project who's interested in charset
> detectors any more. I'd love to be proved wrong though, if anyone has any
> personal contacts on the project they could prod about it?
> >
> > Nick
>
> --------------------------
> Ken Krugler
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message