uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jannik Strötgen <jannik.stroet...@gmail.com>
Subject Re: UIMA Question
Date Wed, 05 Aug 2015 16:51:15 GMT
Hi Khaled,

We also did some work on Arabic in the context of our temporal tagger
HeidelTime. It extracts and normalizes temporal expressions for several
languages including Arabic. In addition, the HeidelTime UIMA kit
contains a UIMA wrapper for the Stanford POS tagger and annotates
sentence, token, and pos information.

HeidelTime is at GitHub:

and there is also an online demo:

All the best,

On 08/05/2015 06:12 PM, Richard Eckart de Castilho wrote:
> Hi,
> as I said, in general UIMA is suitable, but the tools that build on top of UIMA may not
be adapted.
> I know only of a few punctual efforts towards adding support for such languages.
> /me puts on DKPro hat (I'm working on that project)
> E.g. the DKPro Core [1] collection of UIMA components integrates a couple of third-party
tools and models such as Stanford CoreNLP (some arabic), MaltParser dependency parsing (Farsi)
or HunPos postagger (Farsi). However, support is very spotty. E.g. there is no tokenizer for
either of these languages available in DKPro Core. Most of these, I've collected across the
web and integrated. Where possible, I tried to set up at least a few basic unit tests to make
sure these tools and models do at least something, but since I speak neither Arabic nor Farsi...
well... ;)
> /me takes off DKPro hat and puts on WebAnno hat (I'm also working on that project)
> Recently, I've added a basic (experimental) RTL support to the WebAnno annotation tool
[2]. WebAnno internally uses UIMA data structures (CAS) to store annotations and is based
on the same UIMA type system as DKPro Core (plus you can define your own types in WebAnno).
Unfortunately support for RTL languages in browsers is also rather sad. RTL support in WebAnno
works best with Safari [3].
> /me takes off hats
> So, you can use UIMA for these languages, there's already a few things there as well
to build on get inspired from. Afaik there is no comprehensive open source NLP suite for Arabic
or Farsi (or is there?). So if you build such, it would be great and as far as I can tell,
you should be able to interface them with UIMA.
> Cheers,
> -- Richard
> [1] https://dkpro.github.io/dkpro-core/releases/1.7.0/models.html
> [2] https://webanno.github.io/webanno
> [3] https://github.com/webanno/webanno/issues/49
> On 05.08.2015, at 17:58, d.heidarpour@ut.ac.ir wrote:
>> Hi,
>> I have the same goal but for persian, although persian and arabic are
>> different languages but they're using almost same orthography and I'm
>> planning to develope a framework with basic modules for normalizing,
>> stemming, POStagging, syntactic analysis, semantic/sentiment extraction
>> and more. Actually we are a team of 6/7  students (less or more) and each
>> one tries to develope one module as his/her own thesis. The whole effort
>> should be a framework to use in text/audio engineering apps and more
>> importantly for an IR system.
>> Is this architecture suitable for such task and language?
>> Thanks
>> Davood Heidarpour
>>> Hi,
>>> at the level of the internal data representation, UIMA certainly supports
>>> arabic. However, specific visualization tools or analysis components may
>>> not support it. So if you want to program your own analysis with UIMA, you
>>> should be ok. If you want to use UIMA out-of-the-box for Arabic or other
>>> RTL languages, you might be hitting a wall.
>>> If you can explain in more detail what you plan to do, maybe we can give
>>> some more specific pointers.
>>> Cheers,
>>> -- Richard
>>> On 05.08.2015, at 11:09, Khaled Zaki <khaledamir93@gmail.com> wrote:
>>>> hi,
>>>>  this is khaled from Cairo University , and I'm using UIMA for the
>>>> first
>>>> time and I'm having a question considering the text mining , I was
>>>> wondering if the UIMA support mining the Arabic language or not and if
>>>> yes
>>>> what should I do , as I have tried to browse an Arabic file but it
>>>> failed
>>>> regards
>>>> Thank You in Advance.

View raw message