tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: A problem in the right-to-left languages
Date Wed, 02 Nov 2011 01:10:52 GMT
I know some of the original team members - I could ask.

Are there specific questions, or just "is anybody still minding the fire"?

-- Ken

On Nov 1, 2011, at 2:43pm, Nick Burch wrote:

> On Tue, 1 Nov 2011, Robert Muir wrote:
>> Well as an alternative for them committing the ebcdic detection, perhaps we could
look at the Charset detection apis and propose some API additions so that users (like Tika)
can plug in custom detectors?
> 
> In theory it should be pluggable, but I seem to recal we needed to tweak a few core bits
to get the detector working (around negative matches for control characters)
> 
> Looking at the svn version history, the ICU4J team don't appear to have done any work
on their character detectors in several years. From the lack of responses when I asked on
their list about extending them, I fear there may not be anyone left in their project who's
interested in charset detectors any more. I'd love to be proved wrong though, if anyone has
any personal contacts on the project they could prod about it?
> 
> Nick

--------------------------
Ken Krugler
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message