uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From d.heidarp...@ut.ac.ir
Subject Re: UIMA Question
Date Wed, 05 Aug 2015 19:04:16 GMT
Hi Richard,
Thanks for your information.

You're right, there is no comprehensive open source framework for Farsi.
But there is an open source package that If you need tokenizer for Farsi
pipeline in DKPro, you can use. It called Hazm [1] and it is written in
Python. Since its tokenizer completely implemented by regexp, you can
simply convert it into Java (no need for BSF).

Again, Thanks for your information.

Davood

[1] - https://github.com/sobhe/hazm

> Hi Khaled,
>
> We also did some work on Arabic in the context of our temporal tagger
> HeidelTime. It extracts and normalizes temporal expressions for several
> languages including Arabic. In addition, the HeidelTime UIMA kit
> contains a UIMA wrapper for the Stanford POS tagger and annotates
> sentence, token, and pos information.
>
> HeidelTime is at GitHub:
> https://github.com/HeidelTime/heideltime
>
> and there is also an online demo:
> http://heideltime.ifi.uni-heidelberg.de/heideltime/
>
> All the best,
> Jannik
>
>
> On 08/05/2015 06:12 PM, Richard Eckart de Castilho wrote:
>> Hi,
>>
>> as I said, in general UIMA is suitable, but the tools that build on top
>> of UIMA may not be adapted.
>>
>> I know only of a few punctual efforts towards adding support for such
>> languages.
>>
>> /me puts on DKPro hat (I'm working on that project)
>>
>> E.g. the DKPro Core [1] collection of UIMA components integrates a
>> couple of third-party tools and models such as Stanford CoreNLP (some
>> arabic), MaltParser dependency parsing (Farsi) or HunPos postagger
>> (Farsi). However, support is very spotty. E.g. there is no tokenizer for
>> either of these languages available in DKPro Core. Most of these, I've
>> collected across the web and integrated. Where possible, I tried to set
>> up at least a few basic unit tests to make sure these tools and models
>> do at least something, but since I speak neither Arabic nor Farsi...
>> well... ;)
>>
>> /me takes off DKPro hat and puts on WebAnno hat (I'm also working on
>> that project)
>>
>> Recently, I've added a basic (experimental) RTL support to the WebAnno
>> annotation tool [2]. WebAnno internally uses UIMA data structures (CAS)
>> to store annotations and is based on the same UIMA type system as DKPro
>> Core (plus you can define your own types in WebAnno). Unfortunately
>> support for RTL languages in browsers is also rather sad. RTL support in
>> WebAnno works best with Safari [3].
>>
>> /me takes off hats
>>
>> So, you can use UIMA for these languages, there's already a few things
>> there as well to build on get inspired from. Afaik there is no
>> comprehensive open source NLP suite for Arabic or Farsi (or is there?).
>> So if you build such, it would be great and as far as I can tell, you
>> should be able to interface them with UIMA.
>>
>> Cheers,
>>
>> -- Richard
>>
>> [1] https://dkpro.github.io/dkpro-core/releases/1.7.0/models.html
>> [2] https://webanno.github.io/webanno
>> [3] https://github.com/webanno/webanno/issues/49
>>
>> On 05.08.2015, at 17:58, d.heidarpour@ut.ac.ir wrote:
>>
>>> Hi,
>>> I have the same goal but for persian, although persian and arabic are
>>> different languages but they're using almost same orthography and I'm
>>> planning to develope a framework with basic modules for normalizing,
>>> stemming, POStagging, syntactic analysis, semantic/sentiment extraction
>>> and more. Actually we are a team of 6/7  students (less or more) and
>>> each
>>> one tries to develope one module as his/her own thesis. The whole
>>> effort
>>> should be a framework to use in text/audio engineering apps and more
>>> importantly for an IR system.
>>> Is this architecture suitable for such task and language?
>>> Thanks
>>> Davood Heidarpour
>>>
>>>> Hi,
>>>>
>>>> at the level of the internal data representation, UIMA certainly
>>>> supports
>>>> arabic. However, specific visualization tools or analysis components
>>>> may
>>>> not support it. So if you want to program your own analysis with UIMA,
>>>> you
>>>> should be ok. If you want to use UIMA out-of-the-box for Arabic or
>>>> other
>>>> RTL languages, you might be hitting a wall.
>>>>
>>>> If you can explain in more detail what you plan to do, maybe we can
>>>> give
>>>> some more specific pointers.
>>>>
>>>> Cheers,
>>>>
>>>> -- Richard
>>>>
>>>> On 05.08.2015, at 11:09, Khaled Zaki <khaledamir93@gmail.com> wrote:
>>>>
>>>>> hi,
>>>>>  this is khaled from Cairo University , and I'm using UIMA for the
>>>>> first
>>>>> time and I'm having a question considering the text mining , I was
>>>>> wondering if the UIMA support mining the Arabic language or not and
>>>>> if
>>>>> yes
>>>>> what should I do , as I have tried to browse an Arabic file but it
>>>>> failed
>>>>> regards
>>>>> Thank You in Advance.
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>





Mime
View raw message