uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From buddha <buddha_...@yahoo.com.INVALID>
Subject Re: How to annotate based on document collection
Date Fri, 06 Nov 2015 15:21:07 GMT
UIMA works best when you are investigating one document at a time.  My suggestion would be
to run the initial pipeline to get the correct annotation, which I assume are tokens in your
case, then save those off into some relational table.

From there, you can run the documents through again and load your df values as an external
resource, then do the tf the second time.

There are ways to estimate the tf/idf values, but, frankly, the whole notion of “document
frequency” means you’ve looked at the whole corpus at least once.

> On Nov 6, 2015, at 7:12 AM, Christopher Baechle <cbaechle@my.fau.edu> wrote:
> 
> I am working with an existing project that is built with UIMA. I am trying
> to create a tf-idf style score that looks at the set of documents as a
> whole.
> 
> Since the rest of the project uses UIMA heavily, I would like to implement
> this as an annotator if possible, rather than a separate program. Is it
> possible within UIMA to do this?


Mime
View raw message