uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Wunderlich <martin...@gmx.net>
Subject Using UIMA to build an NLP system
Date Sun, 26 Apr 2015 08:12:05 GMT
Hi all, 

I am relatively new to UIMA and I was wondering, if the system would be the right choice for
a project that I am currently working on. In essence, this project deals with a variety of
text classification problems on different levels (document level, paragraph level, sentence
level) using different methods. 

To provide a concrete scenario, would UIMA be useful in modeling the following processing
pipeline, given a corpus consisting of a number of text documents: 

- annotate each doc with meta-data extracted from it, such as publication date
- preprocess the corpus, e.g. by stopword removal and lemmatization
- save intermediate pre-processed and annotated versions of corpus (so that pre-processing
has to be done only once)
- run LDA (e.g. using Mallet) on the entire training corpus to model topics, with number of
topics ranging, for instance, from 50 to 100
- convert each doc to a feature vector as per the LDA model
- train and test an SVM for supervised text classification (binary classification into „relevant“
vs. „non-relevant“) using cross-validation
- store each trained SVM
- report results of CV into CSV file for further processing
- extract paragraphs from relevant documents and use for unsupervised pre-training in a deep
learning architecture (built using e.g. Deeplearning4J)

Would UIMA be a good choice to build and manage a project like this? 
What would be the advantages of UIMA compared to using simple shell scripts for „gluing
together“ the individual components? 

Thanks a lot. 

Kind regards, 

Martin
Mime
View raw message