uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Baechle <cbaec...@my.fau.edu>
Subject Re: How to annotate based on document collection
Date Fri, 06 Nov 2015 15:31:56 GMT
Thanks. That answered my question.

On Fri, Nov 6, 2015 at 10:21 AM, buddha <buddha_314@yahoo.com> wrote:

> UIMA works best when you are investigating one document at a time.  My
> suggestion would be to run the initial pipeline to get the correct
> annotation, which I assume are tokens in your case, then save those off
> into some relational table.
>
> From there, you can run the documents through again and load your df
> values as an external resource, then do the tf the second time.
>
> There are ways to estimate the tf/idf values, but, frankly, the whole
> notion of “document frequency” means you’ve looked at the whole corpus at
> least once.
>
> > On Nov 6, 2015, at 7:12 AM, Christopher Baechle <cbaechle@my.fau.edu>
> wrote:
> >
> > I am working with an existing project that is built with UIMA. I am
> trying
> > to create a tf-idf style score that looks at the set of documents as a
> > whole.
> >
> > Since the rest of the project uses UIMA heavily, I would like to
> implement
> > this as an annotator if possible, rather than a separate program. Is it
> > possible within UIMA to do this?
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message