uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <phi...@ogren.info>
Subject Removing descriptor files from ClearTK
Date Thu, 04 Jun 2009 16:06:38 GMT
I would like to weigh in on the recent discussion (previously titled 
"Parameters in uima descriptors") w.r.t. our thinking about descriptor 
files in our UIMA project, ClearTK.  The last time we got together we 
decided that we were going to move away from providing descriptor files 
for our project and move towards providing static factory methods for 
creating *Description objects (e.g. AnalysisEngineDescription).  If you 
check out the code and look at it now - you will see that there are 
still descriptor files scattered throughout our code and that we have 
started adding these factory methods - but that realizing this goal is 
still in progress.  (see http://cleartk.googlecode.com)  These methods 
will serve two purposes - 1) allow users to directly instantiate our 
components in Java and 2) to guide users in how to write descriptor 
files for our components.  While we understand the purpose and necessity 
of descriptor files, we are not going to provide them for the following 

1) maintaining descriptor files is a giant pain in the butt.  The 
developers of ClearTK are two graduate students and a postdoc and we do 
not have the resources (or patience) to maintain these files.  We have 
found that as we have evolved and refactored our code that our 
descriptor files are constantly breaking and are absurdly burdensome to 
maintain.  I don't want to call out others in this conversation (please 
chime in as you will!) but I have had a number of conversations with 
developers on several other UIMA projects and I am not alone in my 
loathing of maintaining descriptor files.  The maintainance is 
particularly burdensome for descriptor files that you might create for 
your unit tests.  They are constantly breaking, they are tedious to fix, 
and they discourage code refactoring and evolution by their mere 
presence (let me tell you how I really feel!)

2) We cannot create all possible descriptor files that might be needed 
to use ClearTK in the ways desired by the user.  Our library relies 
heavily on dynamic class loading driven by class names provided in 
configuration parameters.  For example, when you are writing training 
data for a particular machine learning classifier you can specify the 
class name of the data writer to be used (e.g. one for maxent or 
libsvm).  These data writers may require additional configuration 
parameters that must be set in the descriptor file.  Therefore, what 
ends up in the descriptor file is determined by a specific use-case and 
is not constrained to a fixed set of configuration parameters. 

It is our goal to make it easy for users of ClearTK to be able to make 
descriptor files that are specific to a users use-case/scenario by 1) 
creating factory methods that demonstrate common ways that our 
components can be described (i.e. the user can study these methods when 
writing their own descriptor file) and 2) by naming our configuration 
parameters according to a strict naming convention which points the user 
to the canonical definition and documentation for a configuration 
parameter (e.g. 
"org.cleartk.classifier.InstanceConsumer.PARAM_ANNOTATION_HANDLER") and 
3) by providing documentation on how to do this.

Here are a few more points that I want to make:

- we are not ruling out providing some descriptor files - esp. for 
configurations that we think/hope will be useful to have for running 
some of our components "out-of-the-box".  Much of our code is intended 
as a framework for users of ClearTK to create their own components using 
common machine learning approaches.  While, we could anticipate the 
general structure of descriptor files for such user-generated components 
(and we have tried) we have decided that descriptors such as these in 
particular will no longer be provided in ClearTK. 
- we are not getting rid of type system descriptor files. 
- we are not saying that descriptor files are of no use.  They are 
clearly very nice to have for sharing and for deployment.  I do not use 
them myself for setting up and running experiments in my research and we 
feel that for the above reasons we do not have to provide them. 

I hope this clarifies the discussion.


View raw message