uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: UIMA Question
Date Wed, 05 Aug 2015 16:12:42 GMT
Hi,

as I said, in general UIMA is suitable, but the tools that build on top of UIMA may not be
adapted.

I know only of a few punctual efforts towards adding support for such languages.

/me puts on DKPro hat (I'm working on that project)

E.g. the DKPro Core [1] collection of UIMA components integrates a couple of third-party tools
and models such as Stanford CoreNLP (some arabic), MaltParser dependency parsing (Farsi) or
HunPos postagger (Farsi). However, support is very spotty. E.g. there is no tokenizer for
either of these languages available in DKPro Core. Most of these, I've collected across the
web and integrated. Where possible, I tried to set up at least a few basic unit tests to make
sure these tools and models do at least something, but since I speak neither Arabic nor Farsi...
well... ;)

/me takes off DKPro hat and puts on WebAnno hat (I'm also working on that project)

Recently, I've added a basic (experimental) RTL support to the WebAnno annotation tool [2].
WebAnno internally uses UIMA data structures (CAS) to store annotations and is based on the
same UIMA type system as DKPro Core (plus you can define your own types in WebAnno). Unfortunately
support for RTL languages in browsers is also rather sad. RTL support in WebAnno works best
with Safari [3].

/me takes off hats

So, you can use UIMA for these languages, there's already a few things there as well to build
on get inspired from. Afaik there is no comprehensive open source NLP suite for Arabic or
Farsi (or is there?). So if you build such, it would be great and as far as I can tell, you
should be able to interface them with UIMA.

Cheers,

-- Richard

[1] https://dkpro.github.io/dkpro-core/releases/1.7.0/models.html
[2] https://webanno.github.io/webanno
[3] https://github.com/webanno/webanno/issues/49

On 05.08.2015, at 17:58, d.heidarpour@ut.ac.ir wrote:

> Hi,
> I have the same goal but for persian, although persian and arabic are
> different languages but they're using almost same orthography and I'm
> planning to develope a framework with basic modules for normalizing,
> stemming, POStagging, syntactic analysis, semantic/sentiment extraction
> and more. Actually we are a team of 6/7  students (less or more) and each
> one tries to develope one module as his/her own thesis. The whole effort
> should be a framework to use in text/audio engineering apps and more
> importantly for an IR system.
> Is this architecture suitable for such task and language?
> Thanks
> Davood Heidarpour
> 
>> Hi,
>> 
>> at the level of the internal data representation, UIMA certainly supports
>> arabic. However, specific visualization tools or analysis components may
>> not support it. So if you want to program your own analysis with UIMA, you
>> should be ok. If you want to use UIMA out-of-the-box for Arabic or other
>> RTL languages, you might be hitting a wall.
>> 
>> If you can explain in more detail what you plan to do, maybe we can give
>> some more specific pointers.
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>> On 05.08.2015, at 11:09, Khaled Zaki <khaledamir93@gmail.com> wrote:
>> 
>>> hi,
>>>  this is khaled from Cairo University , and I'm using UIMA for the
>>> first
>>> time and I'm having a question considering the text mining , I was
>>> wondering if the UIMA support mining the Arabic language or not and if
>>> yes
>>> what should I do , as I have tried to browse an Arabic file but it
>>> failed
>>> regards
>>> Thank You in Advance.
>> 
>> 
> 
> 
> 
> 


Mime
View raw message